Alt data’s growing pains: Integration and aggregation challenges stall wider adoption

Demand for alternative data continues to grow among investment firms. So why are some alt data providers taking products off the market?

Credit: Samuel Torres

Alternative data—generally, data generated as a byproduct of other business activities—has been the darling of hedge funds over the past half-dozen years. But after a blinding start, has the alt data race stalled, or merely settled into a more stable pace for the long haul?

According to a report from research firm Coalition Greenwich earlier this year, 44% of investment firms already use alt data to support portfolio construction or trading, and a further 19% of respondents plan to start using it over the next 12 months.

However, London-based alternative data consultancy Neudata has noticed a worrying trend: the past three years saw big jumps in the number of alternative datasets being taken off the market—and in some cases, companies pausing or exiting their data monetization side projects. Over the three-year period from 2017 to 2019, Neudata counted a total of 37 datasets taken off the market. In 2020, that jumped to 68, with 41 leaving the market in 2021, and another 59 datasets being discontinued last year. The largest single data type leaving the market (16% of the total in 2022) was web-scraped data—often used to compile and compare aggregated prices for goods and services, or to collect data on individuals from public sources, such as social or professional networks.

One reason for this may be that there were too many companies doing this, but doing it badly—say, being inconsistent in the data they scrape from different websites, or scraping datasets that, while perhaps interesting intellectually, didn’t actually deliver anything valuable, says Daryl Smith, head of research at Neudata.

“In the world of alt data, it’s easy to start a company and start scraping websites rather than, for example, finding a new source of credit card data. So, small companies may start up, but run out of money while trying to get clients and set up trials.”

Web-scraped data may also have suffered the most because web scraping is straightforward enough that most firms can do it themselves, says Tim Gavin, former president and co-founder of Ark Data, a startup intended to create structure around alternative data. Ark Data was bought by BlackRock in 2021.

In addition, there are data quality issues with web-scraped data, while web-scraped data often isn’t mapped to market and reference data that would make it useful for investing, Gavin adds.

“The data you’re scraping is unlikely to be tagged to a company identifier, and very often you are trying to ‘fuzzy match’ on a company description. Or, you might get a lot of great data on products, but you need to get that product data mapped back to the company that makes that product. These are not easy lifts for your average user at smaller shops.”

Furthermore, there may be lingering uncertainty around the legal rights to collect and use web-scraped data. While a recent lawsuit in which LinkedIn accused HiQ Labs of illegally scraping user profiles from its site was settled, the court ruled both that web scraping is legal, but that the way in which HiQ scraped the data violated LinkedIn’s user agreement.

The result, Neudata’s Smith says, is that you can still scrape data from LinkedIn—or have someone else do it for you—so long as you don’t bypass the site’s logins. “So, web scraping companies who do well will have to constantly prove to clients that their datasets are compliant.”

Kelly Koscuiszka, partner at law firm Schulte Roth & Zabel, where her responsibilities include conducting due diligence reviews of alternative datasets and their providers on behalf of financial firms, says it’s unlikely that web-scraped datasets are disappearing as a result of compliance concerns because that’s an easy fix. Instead, she says, new privacy regulations could make or break providers of certain datasets.

“It could be that what you’re doing was fine, but then the rules change … and that the investment required to make your data compliant means the margins don’t make sense anymore,” Koscuiszka says. This not only changes the competitive landscape of data providers, but also the “panel” of data available to end-users, potentially resulting in narrower coverage and less information to support decisions.

“If a data source decides it can’t comply, then the data aggregator has fewer sources,” she adds.

But it’s not just limited to web-scraped datasets and their suppliers. According to Neudata, fundamental datasets and location-based data accounted for 11% and 10% of discontinued data types, respectively. Add in discontinued datasets in the transactional, ESG, web-tracking and crowdsourced data spaces—some of which could fall foul of the privacy concerns noted by Koscuiszka—and these collectively account for more than two-thirds of the departures.

Does this mean the alt data industry as a whole is in trouble? Probably not. Indeed, Neudata’s Smith says the numbers represent natural erosion, and he notes that the alt data market isn’t shrinking overall. But it does mean that individual companies banking on an alt data strategy may have a harder time competing and surviving in an uncertain economic climate.

Some vendors may have financial issues, while others may actually be successful at selling some products, and decide to sunset any less successful ones. Or, there may be a strategic component behind a decision to take a product off the market: for example, a company may appoint a new CEO who decides that selling “exhaust” data is non-core to its business and shuts down that division, Smith says.

‘It takes time to get traction’

Scott Hall, co-founder and CEO of AltHub, a sales advisory and accelerator firm serving “high alpha” startup alt data providers, notes that last year was a challenging one for asset managers, which had a tack-on effect for making purchases of data products. While trials were happening, deals weren’t being closed.

One reason for this, Hall says, is that the world of alt data is still a small pond with a few big fish and a lot of minnows. At one extreme, there’s concentration risk, while at the other, smaller companies risk over-extending themselves.

“Probably 70% to 80% of overall revenues from alternative data are attributable to a small number of companies,” such as large credit card firms, which—though they may not be widely perceived as alt data companies—provide the bulk of monetized alt data, Hall says. “Then there are a bunch of companies with fewer than five hedge fund clients—and maybe those can continue if they have enough core revenue from those clients.”

Then there’s a third category of newer alternative data companies, founded over the past few years specifically to take advantage of the potentially profitable—but also potentially risky—opportunity to sell to the financial vertical.

“These typically aren’t well funded—and you need a minimum viable budget to become a player in the alt data space. It takes time to get traction, to get that first client, and to get the first five clients. And last year hasn’t helped,” Hall says.

Samantha Campbell, former CEO of alt data advisory and accelerator firm Alqami, and now global head of data monetization and management at CK Delta, the data arm of CK Hutchinson Holdings, agrees, noting that alt data typically has a long sales cycle. Buyers expect data to be institutional-quality, and to be granular, while also having broad coverage. One challenge for alternative data sources is that their data is often a byproduct of their core business operations, and is not presented in an institution-ready way. That’s what companies like AltHub and Alqami specialize in.

The reality is, the barriers to entry to selling to hedge funds are very high
Samantha Campbell, CK Delta

“The reality is, the barriers to entry to selling to hedge funds are very high,” Campbell says. “If you have a niche dataset, there is definitely the potential to deliver information advantage. But it will take time to find the right buyer, so you have to weigh the cost of developing and packaging that data against the time taken to generate revenue.”

In some cases, companies may mistakenly believe their exhaust data is worth more than firms are willing to pay in reality, and may create data businesses to sell something that has little real value, adds Matt Ober, general partner at seed-stage investment firm Social Leverage.

“There are new data vendors popping up every day, but I’m not sure if they’re all truly unique. There are a lot of people who have data and are interested in selling it to hedge funds, but don’t realize that hedge funds won’t pay millions of dollars for single datasets. It’s a long, hard road to sell to hedge funds, and if they have a bad year, they’ll cut you off.”

To be a successful vendor, companies need an array of data products—not just one niche dataset of dubious value on its own—because not every dataset will be considered new all the time, Ober says. And companies should target sales at corporate clients for competitive intelligence as much as hedge funds.

“Selling alpha is hard. So, the goal should be to become a better dataset.”

Yet, despite the “natural turnover” of datasets and providers, the amount of alt data and providers continues to grow. Neudata says the number of new datasets coming to market far outstrips those being discontinued. And even web-scraped datasets—currently most prominent among those leaving the market—aren’t doomed to failure.

“If anything, the number of companies providing web-scraped data is probably going to increase because it’s so easy to set one up and provide digestible information,” says Neudata’s Smith.

SRZ’s Koscuiszka agrees: “Web scraping is becoming more common, and people are becoming more comfortable with it. I think some of the scariness and mystery goes away as it becomes more ubiquitous.”

Heavy lifting

The same could be said for any alt dataset. The less mysterious it is, the more transparent it becomes, and—in theory—the more widely used it becomes. According to the Coalition Greenwich report, 79% of firms have increased the number of alt data sources they use. Yet, only 47% increased their budgets for alt data, and only 42% increased their internal resources focused on alt data.

As a result, firms are having trouble coping with the amount of new data available. And as the number of datasets continues to grow faster than the number of data scientists who can interpret and use them, there aren’t enough data scientists at user firms to perform a thorough analysis of all the available datasets, Hall says.

“Firms don’t have the bandwidth to evaluate them all, so they’re constrained in terms of their ability to bring on new products,” Hall says. “There are only maybe 75 funds in the world who can do the heavy lifting of handling all that data. All the others need something more refined that focuses on just a few stocks.”

For some clients, that can be sufficient, and a vendor whose data is only mapped to certain industries or sub-segments could still be successful. But for those who need broader data mapped to a larger universe of securities, then either the financial firm—or, if they don’t have the data science resources in house to perform that task themselves, their supplier—needs to invest in additional data mapping. Either of these increases the work associated with the project and the cost of acquiring and integrating the data, as well as lengthening the timeframe before the data can be used. Altogether, this will impact a firm’s decision whether to buy it or not.

One way around this is to partner with larger, traditional market data vendors who have access to broader datasets and the experience to link alt data to mainstream market and reference data, and use that bigger vendor as a value-adding channel to market—or frankly to sell out to one of those bigger vendors—or risk being pushed aside by them once they acquire access to alt data sources.

If the Refinitivs or Bloombergs of the world can offer something similar enough to what a niche alt data vendor can provide, and can leverage their existing relationships with financial firms—whereas new alt data startups must build that client base from scratch—then that’s a big problem for alt data providers, says Michael Beal, co-founder and CEO of investment firm Data Capital Management.

And as the number of alt datasets continues to grow, firms may relish the thought of only dealing with one provider who aggregates alt data, rather than dozens or more different suppliers—one for each dataset, Beal says.

Hedge funds used to call vendors. Not anymore. So, you need to present better use cases for your datasets
Scott Hall, AltHub

“The analogy I use is the industrial revolution and railroad tracks. Historically, there were big fights between the Carnegies and the Vanderbilts … but in reality, there only needs to be one rail line connecting Pittsburgh and New Jersey, with lots of different trains riding on that one track,” he says.

In fact, the three biggest challenges to alt data adoption—integration, interoperability, and aggregation—can only be solved by larger aggregators, says David Easthope, head of fintech, market structure and technology at Coalition Greenwich, who authored the recent report.

Sell smarter

For small, independent alt data providers, success or survival will be their ability to better present and more persuasively sell their products. Better presentation, or creating a front-end to query large amounts of alternative data—such as interfaces developed by Exabel or Maiden Century—can make it easier to access the data. However, vendors still need to tailor more targeted sales pitches.

“Hedge funds used to call vendors. Not anymore. So, you need to present better use cases for your datasets,” says AltHub’s Hall.

If they don’t, they can expect to miss out on much broader potential opportunities, adds CK Delta’s Campbell. “There are thousands of hedge funds, asset managers, and private equity firms who would want to use alternative data, but only a small segment have the resources to do it.”

Specifically, Social Leverage’s Ober estimates that only around 25% of investment firms are actually using alternative data right now. Coalition Greenwich’s report places that figure higher—at 44%—although the difference may be accounted for by differing definitions of what constitutes “alternative” or to what extent they are using it, and for what purpose.

And for firms outside of that segment, tapping alternative data may not be viable, even though they’d like to use it. “Maybe the Two Sigmas and WorldQuants can make something of raw alternative data, but for the average potential client, dealing with that data is tough. They’re going to say ‘You’re just going to hand me that? What am I supposed to do with that?’” says Ark Data’s Gavin.

Not only must suppliers sell and present better, they must also do it quicker. Campbell says there may only be a small window of opportunity during which a dataset is useful to sophisticated hedge funds. Within six months of subscribing to a dataset, the fund starts to see the alpha decay. After that, the dataset is no longer useful, and the firm looks for a new edge.

While established data vendors may account for alpha decay on a continual basis by constantly updating their datasets, new entrants to the market must manage that and a number of other factors within that timeframe or risk missing sales opportunities—and getting over those initial hurdles can take time.

When we were fundraising, there was lots of VC money out there. But now, with the climate changing, it’s significantly harder for firms to close that first round
Tim Gavin, Ark Data

Campbell says that when she co-founded Alqami, many companies approached her claiming they were sitting on a gold mine of data, but would sometimes stumble at Alqami’s first interrogation of its data—whether the company has the right to sell the data, or, for example, whether it contains personally identifiable customer data. Then Alqami would review the data quality itself, looking at its history—how much is available, how complete it is; consistency, frequency, and its differentiating factors compared to other datasets, i.e., what makes it uniquely valuable. Then comes the hardest part: getting it into the hands of end-users willing to test it and put it through the wringer.

At the same time, startups are under pressure from investors to deliver higher returns, faster. “When we were fundraising, there was lots of VC money out there,” Gavin says. “But now, with the climate changing, it’s significantly harder for firms to close that first round, and VCs may now be saying to their existing portfolio companies, ‘You only get one round for the foreseeable future, so make that cash last.’ They want to see profits.”

And once they get hold of the data, firms need to be able to respond quicker so they can make purchasing decisions with enough time to take advantage of the data before alpha decay strikes. As such, firms will need to develop or buy tools to understand datasets more quickly and to derive actionable value from datasets more quickly.

Once the tools are in place to help non-data scientists within firms take advantage of the data, then alternative data will become more mainstream. Until then, it remains the domain of a few data scientists with specific skillsets—skills that command high salaries, and therefore put a large data science team beyond the reach of some firms.

“Hedge funds will always look to go direct to the source of proprietary data,” says Coalition Greenwich’s Easthope. “But more traditional asset managers and hybrid quantamental funds will look to their existing relationships to source some of this alternative data, because they don’t have the resources in-house and can’t just go out and hire more staff.”

Inevitably, this will lead to concentration and consolidation—and fewer providers serving the space with broader offerings. While Neudata hasn’t crunched the numbers for the first few months of 2023 yet, Smith says, “I think it’s safe to say discontinuations are on track to exceed last year.”

But some say that’s inevitable as the alt data market matures and settles down. “The institutionalization of alternative data will naturally lead to a desire for fewer, stronger providers, rather than smaller, riskier vendors,” says DCM’s Beal. “VC firms see that, and funding will go to those few with the ability to find something truly new.”

And that’s the real challenge for the alt data community: alt data provides an edge, but it’s an edge that is short-lived and easily dulled. Each dataset must deliver something unique that’s hard for others to replicate, or be constantly sharpened to remain useful. And for its use to become more widespread, firms must either invest in more data science resources, or rely on tools that vendors will need to provide that make it easier to understand the value of the data. Then, once the industry reaches full maturity, it will finally outgrow its growing pains.

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: http://subscriptions.waterstechnology.com/subscribe

You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.

‘Feature, not a bug’: Bloomberg makes the case for Figi

Bloomberg created the Figi identifier, but ceded all its rights to the Object Management Group 10 years ago. Here, Bloomberg’s Richard Robinson and Steve Meizanis write to dispel what they believe to be misconceptions about Figi and the FDTA.

Where have all the exchange platform providers gone?

The IMD Wrap: Running an exchange is a profitable business. The margins on market data sales alone can be staggering. And since every exchange needs a reliable and efficient exchange technology stack, Max asks why more vendors aren’t diving into this space.

Most read articles loading...

You need to sign in to use this feature. If you don’t have a WatersTechnology account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here