FactSet Datasets Go Live on Snowflake Data Warehouse
FactSet is the first major data provider to make content available via the cloud-based data-warehousing platform.
FactSet customers can now access 58 of the data giant’s datasets via Snowflake, a cloud-based and cloud-agnostic data-warehousing platform to enhance quantitative research and big data analytics.
The release comprises 34 datasets from third parties through Open:FactSet Marketplace, alongside 24 proprietary FactSet feeds, which include fundamentals; supply chain data; geographic revenues; point-in-time consensus estimates; information on spending trends; news sentiment; and environmental, social, and governance (ESG), among others. The datasets are in the form of FactSet’s standard data feeds or bulk copies of the company’s singular databases. Users can access them through the Snowflake Data Marketplace.
The advantages offered by the pairing are speed, ease of access, and reductions in the cost of data ownership, says Bryan Lenker, FactSet vice president and director of client technology and services, who gave a demo of the new service to WatersTechnology.
In a function called a share, FactSet deploys its already normalized and integrated data only once into Snowflake—deltas will be processed hourly—and in less than 10 minutes, it is live on the client side. Analysts and programmers can begin creating their own “research sandboxes,” build back-testing models or perform factor testing with data “clones,” or copies of datasets that can be made without needing to be physically stored.
Snowflake also cuts the time required to load up data and run queries. In a side-by-side comparison, Lenker ran a query from FactSet’s shipping database through Microsoft SQL server and through Snowflake to determine which countries are sending the most ships to US ports. The SQL query took one minute and 23 seconds to run; Snowflake clocked in at just under 14 seconds. In a second demo query, Snowflake took 16 seconds.
At 12:31 p.m. on the day of the demo, Lenker initiated a data share with a sample client account. By 12:40 p.m., the account had access to 13 new databases.
“I think that’s why you see Snowflake getting so much excitement in the financial industry—the ability to take processes that normally take maybe a week or so and run them in days or every day,” Lenker says.
One example is FactSet’s pricing database. End-of-day security pricing for each security covered by FactSet uses a large amount of historical data that goes back years. To manually load all that content directly into a client’s on-premise system and create the structure, it would typically take the data provider three to four hours from start to finish every time.
The partnership marks the first major data provider to make content available via Snowflake, which competes with Amazon Redshift, Google BigQuery, and Apache open-source tools like Spark and Hadoop.
Gene Fernandez, CTO at FactSet, says compared to other providers, Snowflake takes more of the onus off its users. Redshift, he says, works very well on AWS, but isn’t built for users of other proprietary clouds. Spark is a great open-source platform that works across different clouds, but users tend to take on more infrastructure engineering and ongoing maintenance, he says. But the idea to partner with Snowflake first came from clients.
“There was quite a lot of client interest in Snowflake, so we took a look at the platform,” Fernandez says. “We’re really optimistic about the partnership and the kinds of things we can do for clients using this platform.”
FactSet is already in the process of injecting more data fields into Snowflake. First on the list is tick data, a “gigantic” database that is difficult to move around and provide access to, Lenker says. Morningstar also moved its tick data to the AWS cloud last year, saying that it had risen on firms’ priority lists.
Second is semi-structured data, such as free text or data organized according to JSON or XML formats. FactSet’s news and transcript data fit into this category.
“Historically you’d have to use something like Python for your semi-structured [data] and SQL for your structured [data],” Lenker says. “Now you can do it all inside one language in one environment.”
Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.
To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: http://subscriptions.waterstechnology.com/subscribe
You are currently unable to print this content. Please contact info@waterstechnology.com to find out more.
You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.
Copyright Infopro Digital Limited. All rights reserved.
As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (point 2.4), printing is limited to a single copy.
If you would like to purchase additional rights please email info@waterstechnology.com
Copyright Infopro Digital Limited. All rights reserved.
You may share this content using our article tools. As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (clause 2.4), an Authorised User may only make one copy of the materials for their own personal use. You must also comply with the restrictions in clause 2.5.
If you would like to purchase additional rights please email info@waterstechnology.com
More on Data Management
New working group to create open framework for managing rising market data costs
Substantive Research is putting together a working group of market data-consuming firms with the aim of crafting quantitative metrics for market data cost avoidance.
Off-channel messaging (and regulators) still a massive headache for banks
Waters Wrap: Anthony wonders why US regulators are waging a war using fines, while European regulators have chosen a less draconian path.
Back to basics: Data management woes continue for the buy side
Data management platform Fencore helps investment managers resolve symptoms of not having a central data layer.
‘Feature, not a bug’: Bloomberg makes the case for Figi
Bloomberg created the Figi identifier, but ceded all its rights to the Object Management Group 10 years ago. Here, Bloomberg’s Richard Robinson and Steve Meizanis write to dispel what they believe to be misconceptions about Figi and the FDTA.
SS&C builds data mesh to unite acquired platforms
The vendor is using GenAI and APIs as part of the ongoing project.
Aussie asset managers struggle to meet ‘bank-like’ collateral, margin obligations
New margin and collateral requirements imposed by UMR and its regulator, Apra, are forcing buy-side firms to find tools to help.
Where have all the exchange platform providers gone?
The IMD Wrap: Running an exchange is a profitable business. The margins on market data sales alone can be staggering. And since every exchange needs a reliable and efficient exchange technology stack, Max asks why more vendors aren’t diving into this space.
Reading the bones: Citi, BNY, Morgan Stanley invest in AI, alt data, & private markets
Investment arms at large US banks are taken with emerging technologies such as generative AI, alternative and unstructured data, and private markets as they look to partner with, acquire, and invest in leading startups.