GoldenSource is launching a service that aims to help buy-side firms better manage data within their data lakes and warehouses.
The vendor’s Cloud Data Service has two components: a data pipeline and a data schema. The data pipeline brings information from various data sources into and out of an asset manager’s data lake or data warehouse. The data schema helps organize and structure the data. The key element is flexibility, says Tom Stock, head of product management at GoldenSource: the service sucks in the data in all formats; it then cleans, refines and structures the data; and then the data is pushed downstream into the buy-side firm’s internal analytics systems.
“We are trying to [help firms] get out of that data swamp problem,” Stock says.
For the service, GoldenSource created a common orchestration layer using Apache Airflow, an open-source workflow management platform for data engineering pipelines, as well as the Apache Hive Metastore to manage its metadata. “We used a lot of out-of-the-box, open-source components to construct our end-to-end data ‘lake house’ offering,” he says.
Whatever type of analytics tools a customer wants to use, they should be able to deploy those tools on top of our data lake framework
Tom Stock, GoldenSource
The data pipeline allows consumers to bring in all types of data—vendor data or shared data from a cloud marketplace, as well as their proprietary data in both structured and unstructured formats—and deposit it into their chosen cloud data warehouse or data lake. So if a buy-side firm uses, for example, Cloudera, Snowflake, Databricks, or even Google Cloud Platform, the tool can pull in data from those offerings, and then clean and distribute it to be analyzed.
“We try to keep it very open so that whatever type of analytics tools a customer wants to use, they should be able to deploy those tools on top of our data lake framework to be able to utilize those and have a common experience across their organization,” Stock says.
He coins a portmanteau word from data lake and warehouse to describe the need for the service—“lake house”.
“So you’ve got a portion of the lake house, which is the left side—I call it more data lake-oriented, more in tune to rapid ingestion and large volumes of data for analytics—joined with the right side of the lake house, which is the data warehouse-type refined area, which is more structured, normalized across different sources of data,” he says. “If I’ve got position information coming in from multiple custodians, we normalize that and create a single model of that data. So you’ve got different types of data in there for different use cases. Plus, over the top, you have a consumption layer that will allow different ‘personas’ within the organization to access views of that data.”
Although data warehouses help firms with processing data, GoldenSource contends that they are light on that traditional data lake capability. “Cloud data warehouses provide a lot of analytical horsepower, but they didn’t have the full capabilities around rapid ingestion, which is what data lakes have. So what we did is marry those together,” Stock says.
Another key is the concept of data concordance, which means having common identifiers. “If I’m bringing in data—even in my raw zone for analytics—I need to be able to cross-reference that data that’s sitting within my refined zone that I’m using for operational reporting,” he says. “So things like instrument IDs—am I using Isin, or Cusip? If I’m looking at organizational entity data, how do I know the entity data that I have sitting in my ESG raw scores can be related to the entity master data and the security position data that I have within my refined zone?”
New challenge
While cloud has transformed the capital markets, it has also introduced new headaches.
Jeremy Katzeff, head of buy-side solutions at GoldenSource, says one of the challenges asset managers and hedge funds face is the expanding “personas” consuming datasets within cloud-based data platforms and warehouses. What he means by personas is that more units—and, thus, individuals—need more data, whether for portfolio construction, risk management, sales, marketing, or product development. Quant teams alone want more and more data, but the risk is that there’s wasteful spending on data, the wrong people have access to it, or different business units don’t have the same view of the data.
“It’s moving beyond the traditional personas of middle-office tech and ops that need a security master or some cleansed information that they can then give to the portfolio managers to raise and order and execute a trade,” he says.
Eiichiro Yanagawa, a senior analyst at research and advisory firm Celent, says that if GoldenSource’s new Cloud Data Services offering delivers, it will solve a key challenge for data managers in the capital markets—turning data into actionable insights. “If it goes beyond providing partial functionality to improve data lakes to a platform (multi-cloud, multi-user, multi-institution) that turns data into insight, then [GoldenSource’s] strategy is truly the future.”
A cloud of its own
Like so many other companies in the capital markets, GoldenSource is in the midst of its cloud migration journey. It is currently moving towards a serverless set up to take advantage of elastic compute and to make its system more modular and easier to deploy and maintain.
Tom Stock, the vendor’s head of product management, says the migration to the cloud was accelerated by its message-based event-driven architecture. One of the things it did was move away from being a strictly Oracle-based application and offer Postgres as an option.
“For cloud implementations, Postgres is a database that most of the cloud providers offer and it is widely used in public cloud implementations,” he adds. The other thing GoldenSource is doing is containerizing its application using Docker and having the infrastructure managed by Kubernetes.
Stock explains that GoldenSource has split its application into individual containers that can be easily scaled up for processing on the cloud. “Our business engines are in certain containers, [and] our data model itself is in certain containers,” he says.
The next step is using Kubernetes as the container manager to enable a serverless environment. “Over the last several years, we’ve made those steps to really start leveraging the power of the cloud. … Under the hood, we still have our application server that we use there. Our next step is to remove the application server components out of that and then take our engines that run underneath that application server and migrate those to microservices running on the cloud,” he says.
GoldenSource aims to finish most of that migration by early 2023.
“The traditional deployment that most people still have is you buy some virtual CPUs and memory,” adds Jeremy Katzeff, head of buy-side solutions at GoldenSource. “We want to move away from that model, so it’s more elastic and more of a pay-as-you-go model, because it’s more scalable. Whereas if you buy something fixed now, and in two years you’d have to do another exercise because someone hit the top of their capacity, it becomes an in-depth exercise to figure out what the next stages of growth are.”
Further reading
Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.
To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: http://subscriptions.waterstechnology.com/subscribe
You are currently unable to print this content. Please contact info@waterstechnology.com to find out more.
You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.
Copyright Infopro Digital Limited. All rights reserved.
As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (point 2.4), printing is limited to a single copy.
If you would like to purchase additional rights please email info@waterstechnology.com
Copyright Infopro Digital Limited. All rights reserved.
You may share this content using our article tools. As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (clause 2.4), an Authorised User may only make one copy of the materials for their own personal use. You must also comply with the restrictions in clause 2.5.
If you would like to purchase additional rights please email info@waterstechnology.com
More on Data Management
New working group to create open framework for managing rising market data costs
Substantive Research is putting together a working group of market data-consuming firms with the aim of crafting quantitative metrics for market data cost avoidance.
Off-channel messaging (and regulators) still a massive headache for banks
Waters Wrap: Anthony wonders why US regulators are waging a war using fines, while European regulators have chosen a less draconian path.
Back to basics: Data management woes continue for the buy side
Data management platform Fencore helps investment managers resolve symptoms of not having a central data layer.
‘Feature, not a bug’: Bloomberg makes the case for Figi
Bloomberg created the Figi identifier, but ceded all its rights to the Object Management Group 10 years ago. Here, Bloomberg’s Richard Robinson and Steve Meizanis write to dispel what they believe to be misconceptions about Figi and the FDTA.
SS&C builds data mesh to unite acquired platforms
The vendor is using GenAI and APIs as part of the ongoing project.
Aussie asset managers struggle to meet ‘bank-like’ collateral, margin obligations
New margin and collateral requirements imposed by UMR and its regulator, Apra, are forcing buy-side firms to find tools to help.
Where have all the exchange platform providers gone?
The IMD Wrap: Running an exchange is a profitable business. The margins on market data sales alone can be staggering. And since every exchange needs a reliable and efficient exchange technology stack, Max asks why more vendors aren’t diving into this space.
Reading the bones: Citi, BNY, Morgan Stanley invest in AI, alt data, & private markets
Investment arms at large US banks are taken with emerging technologies such as generative AI, alternative and unstructured data, and private markets as they look to partner with, acquire, and invest in leading startups.