Death of the Data Warehouse
Two panelists at the Buy-Side Technology North American Summit talk about their firms' use of big data lakes in place of data warehouses.
Hunger for data isn't going to satiate anytime soon. But as the quantity of data used by firms climbs, so too do the issues surrounding it.
A good data governance strategy isn't exactly a sexy topic, but it's a necessary one to tackle with the increasing demand for data.
Scott Burleigh, executive director for JPMorgan Asset Management, said about a year and a half ago his firm made heavy investments into technology around data governance. Burleigh, who spoke on a panel at this year's Buy-Side Technology North American Summit, said the firm found there were multiple copies of data and places where the same data was processed over and over again.
"What evolved over time was that we didn't have a single version of the truth," Burleigh said. "You had different answers for the same instrument. Different rights and returns for the same security. You had weighted average credit ratings that were different between reports. Multiple answers for the same question."
Trip to the Lake
A consolidated area to store the data was the answer, but not via a warehouse. Instead, the firm chose to build a big data lake.
Rashmi Gupta, a data manager at MetLife and fellow panelist, said her firm has taken the exact same approach. Instead of having a traditional centralized warehouse, everything is put into a big data lake, which serves as a data acquisition layer.
A semantics layer ─ a data translation layer that sits on top of the data acquisition layer ─ maps to the enterprise data model. Gupta said big data lakes are one of the biggest trends she sees in the industry now.
"So you have one set of information, one single version of truth, but you don't have all the cost associated and the work and labor involved in creating one single warehouse," Gupta said.
It takes very little time to build up big data lakes, according to Gupta, and they have great scalability. If there is a new application a firm wants to use, all it has to do is put it in the lake and build a translation layer on top of it.
Gupta said there are some issues around data integrity, which makes the translation layer such a critical part of the entire operation.
"It boils down to, very simply put, the whole data warehouse is now being replaced by a high-technology data service layer," Burleigh said.
Tapping at the Source
Burleigh used solvency-related data as an example of how it works. With the data lake, a logical data model brings in data from multiple sources. The data is delivered through a search layer, meaning the user can ask for the type of data or data elements without specifying the source.
"You just talk to the service layer, tell it what data elements you want and it knows where they are," Burleigh said. "It serves it up to you as though it was one source."
JPMorgan has taken it a step further, according to Burleigh, by governing data at the source before it enters the data lake. By doing so, Burleigh said the firm doesn't have to worry about altering the data once it's in the data lake.
"We're identifying the source for the data that goes into the lake and we make changes, or the governance says we need to make changes to the data element," Burleigh said. "We make it at the source and it gets reflected in the data lake."
The Bottom Line
- As firms look to consolidate their data, big data lakes have become popular amongst some firms.
- Big data lakes are an efficient, cost-effective and scalable way to manage large amounts of data thanks to the layers that can be built on top of them.
- Governance functions can also be added to the source of the data, allowing data to be altered or changed before entering the big data lake.
Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.
To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: http://subscriptions.waterstechnology.com/subscribe
You are currently unable to print this content. Please contact info@waterstechnology.com to find out more.
You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.
Copyright Infopro Digital Limited. All rights reserved.
As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (point 2.4), printing is limited to a single copy.
If you would like to purchase additional rights please email info@waterstechnology.com
Copyright Infopro Digital Limited. All rights reserved.
You may share this content using our article tools. As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (clause 2.4), an Authorised User may only make one copy of the materials for their own personal use. You must also comply with the restrictions in clause 2.5.
If you would like to purchase additional rights please email info@waterstechnology.com
More on Emerging Technologies
This Week: Startup Skyfire launches payment network for AI agents; State Street; SteelEye and more
A summary of the latest financial technology news.
Waters Wavelength Podcast: Standard Chartered’s Brian O’Neill
Brian O’Neill from Standard Chartered joins the podcast to discuss cloud strategy, costs, and resiliency.
SS&C builds data mesh to unite acquired platforms
The vendor is using GenAI and APIs as part of the ongoing project.
Chevron’s absence leaves questions for elusive AI regulation in US
The US Supreme Court’s decision to overturn the Chevron deference presents unique considerations for potential AI rules.
Reading the bones: Citi, BNY, Morgan Stanley invest in AI, alt data, & private markets
Investment arms at large US banks are taken with emerging technologies such as generative AI, alternative and unstructured data, and private markets as they look to partner with, acquire, and invest in leading startups.
Startup helps buy-side firms retain ‘control’ over analytics
ExeQution Analytics provides a structured and flexible analytics framework based on the q programming language that can be integrated with kdb+ platforms.
The IMD Wrap: With Bloomberg’s headset app, you’ll never look at data the same way again
Max recently wrote about new developments being added to Bloomberg Pro for Vision. Today he gives a more personal perspective on the new technology.
LSEG unveils Workspace Teams, other products of Microsoft deal
The exchange revealed new developments in the ongoing Workspace/Teams collaboration as it works with Big Tech to improve trader workflows.