FactSet Datasets Go Live on Snowflake Data Warehouse

FactSet is the first major data provider to make content available via the cloud-based data-warehousing platform.

snowflake

FactSet customers can now access 58 of the data giant’s datasets via Snowflake, a cloud-based and cloud-agnostic data-warehousing platform to enhance quantitative research and big data analytics.

The release comprises 34 datasets from third parties through Open:FactSet Marketplace, alongside 24 proprietary FactSet feeds, which include fundamentals; supply chain data; geographic revenues; point-in-time consensus estimates; information on spending trends; news sentiment; and environmental, social, and governance (ESG), among others. The datasets are in the form of FactSet’s standard data feeds or bulk copies of the company’s singular databases. Users can access them through the Snowflake Data Marketplace.

The advantages offered by the pairing are speed, ease of access, and reductions in the cost of data ownership, says Bryan Lenker, FactSet vice president and director of client technology and services, who gave a demo of the new service to WatersTechnology.

In a function called a share, FactSet deploys its already normalized and integrated data only once into Snowflake—deltas will be processed hourly—and in less than 10 minutes, it is live on the client side. Analysts and programmers can begin creating their own “research sandboxes,” build back-testing models or perform factor testing with data “clones,” or copies of datasets that can be made without needing to be physically stored.

Snowflake also cuts the time required to load up data and run queries. In a side-by-side comparison, Lenker ran a query from FactSet’s shipping database through Microsoft SQL server and through Snowflake to determine which countries are sending the most ships to US ports. The SQL query took one minute and 23 seconds to run; Snowflake clocked in at just under 14 seconds. In a second demo query, Snowflake took 16 seconds.

At 12:31 p.m. on the day of the demo, Lenker initiated a data share with a sample client account. By 12:40 p.m., the account had access to 13 new databases.

“I think that’s why you see Snowflake getting so much excitement in the financial industry—the ability to take processes that normally take maybe a week or so and run them in days or every day,” Lenker says.

One example is FactSet’s pricing database. End-of-day security pricing for each security covered by FactSet uses a large amount of historical data that goes back years. To manually load all that content directly into a client’s on-premise system and create the structure, it would typically take the data provider three to four hours from start to finish every time.

The partnership marks the first major data provider to make content available via Snowflake, which competes with Amazon Redshift, Google BigQuery, and Apache open-source tools like Spark and Hadoop.

Gene Fernandez, CTO at FactSet, says compared to other providers, Snowflake takes more of the onus off its users. Redshift, he says, works very well on AWS, but isn’t built for users of other proprietary clouds. Spark is a great open-source platform that works across different clouds, but users tend to take on more infrastructure engineering and ongoing maintenance, he says. But the idea to partner with Snowflake first came from clients.

“There was quite a lot of client interest in Snowflake, so we took a look at the platform,” Fernandez says. “We’re really optimistic about the partnership and the kinds of things we can do for clients using this platform.”

FactSet is already in the process of injecting more data fields into Snowflake. First on the list is tick data, a “gigantic” database that is difficult to move around and provide access to, Lenker says. Morningstar also moved its tick data to the AWS cloud last year, saying that it had risen on firms’ priority lists.

Second is semi-structured data, such as free text or data organized according to JSON or XML formats. FactSet’s news and transcript data fit into this category.

“Historically you’d have to use something like Python for your semi-structured [data] and SQL for your structured [data],” Lenker says. “Now you can do it all inside one language in one environment.”

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: http://subscriptions.waterstechnology.com/subscribe

You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.

‘Feature, not a bug’: Bloomberg makes the case for Figi

Bloomberg created the Figi identifier, but ceded all its rights to the Object Management Group 10 years ago. Here, Bloomberg’s Richard Robinson and Steve Meizanis write to dispel what they believe to be misconceptions about Figi and the FDTA.

Where have all the exchange platform providers gone?

The IMD Wrap: Running an exchange is a profitable business. The margins on market data sales alone can be staggering. And since every exchange needs a reliable and efficient exchange technology stack, Max asks why more vendors aren’t diving into this space.

Most read articles loading...

You need to sign in to use this feature. If you don’t have a WatersTechnology account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here