The IMD Wrap: Beginning of the end for data audits?

This week, there’s exciting news for data bean-counters in the form of a partnership between two vendors that could change the way we view and track data usage and audits.

These fortnightly editorial columns aren’t intended to break news; rather, they generally recap issues we’ve already written about, tying themes together and placing them in context with a little personal commentary. This time, however, is different. This time, I get to break some news, too.

OK, so it may not be earth-shattering, but if, like many, you don’t like data audits, or if you’re concerned about tracking the data used by AI applications in large language models, read on.

But first, ask yourself this: why do we audit? Exchanges, data sources and providers audit those who consume their data because it has value, and they want to ensure they are properly compensated for it, and that it isn’t being used to create more value outside the terms of the contracts that govern its use. 

Why would vendors think that was the case? Because there have been frequent instances—usually, though not always, accidental—of unlicensed usage or data leakage and underreporting. Underreporting results in underpaying, and vendors and exchanges don’t like to be underpaid. It leads to suspicion and distrust.

Now, to be fair, there’s distrust on both sides. End-users complain that vendors price their services not based on the intrinsic value of the data, but rather on what value the user gets out of it—hence why non-display and derived data clauses exist and typically cost more than ordinary display licenses, because machines can perform more trades than a human trader or create more valuable derived works based on a vendor or exchange’s original data. 

Hence, end-users are reluctant to reveal their strategies and how they use data. Meanwhile, the vendors and exchanges see that reluctance and raise the stakes, suspecting that clients are withholding how they’re using the data, and may be trying to avoid paying what they deserve. In short, there’s no transparency on both sides, and no trust.

Mark Almeida spent 20 years in various roles at Moody’s Investors Service, then another dozen years as president of Moody’s Analytics. He says vendors would always include the right to audit in contracts, but would rarely exercise it because of how intrusive and unpopular audits are.

Today, Almeida, among other things, sits on the board of Tractiv, which is teaming up with VendEx Solutions. Avid readers may recognize these company names, but for those who don’t, here’s a quick refresher: Tractiv develops data tracking and tracing technology that monitors the movement of data in detail and records it in a ledger. 

We can make sure that consumers are only paying for the data they need and ensure that vendors are properly compensated
Drew Orsinger, Tractiv

VendEx operates a catalog of data vendors and services (VSource), which can be mapped to regulatory requirements, digitizes details of contractual terms and usage rights in a database called VKey, and last week was granted a patent for its VendEx Identifier (VID), a “Cusip-like” alphanumeric code identifying all vendor services. 

Just as any security has a unique identifying code that tells you where it’s traded, VendEx’s VID assigns a code to each dataset, making it uniquely identifiable. The code tells you who supplies it, whether it comes from a parent organization or subsidiary, and assigns a code to each of a vendor’s products that defines exactly what data it contains at a granular level, including asset class and currency. 

This establishes the provenance of any piece of data—a timely issue when AI applications are deriving answers from data they may not be supposed to access, which allows content originators to know what data is being used—and also creates a standard that can be used in data catalogs being rolled out by firms seeking to better manage their data costs while being able to explore and gain exposure to datasets they may not be aware of.

The alliance between the two companies takes the VID, which defines what data is being used, along with the digitized usage rights code, which defines how and where it can be used. Tractiv’s engine then tracks how and where the data is used in comparison to the information from VendEx—effectively, enforcing and auditing data usage in real time. VendEx defines permitted and prohibited uses, required activities, and exceptions across uses such as publishing, sharing, storing data or creating derivative works.

“From there, we record the usage on a centralized, immutable ledger that can be made available to the consumer and/or the provider of the data,” says Tractiv CEO Drew Orsinger, who explains that the tracking can be used for different purposes by both users and vendors. Vendors can monitor where data is used to ensure compliance, while users can actively monitor how much a particular dataset is used. 

For example, if Tractiv provisions a desk with a dataset but nobody actually uses it, they can alert the users so that they can cancel it and avoid paying for unused data. “We can make sure that consumers are only paying for the data they need and ensure that vendors are properly compensated,” says Orsinger.

The potential opportunity here is obvious: by introducing that level of transparency and control, you can ensure downstream tracking and usage of data remains within its terms and conditions. Yes, downstream and tertiary breakups of data may still prove a challenge, but this is a good first step towards solving that, too. And if both sides have access to the tracking information, consumers can be aware of any potential breaches and address them, while suppliers can see where their data is being used and can trust that their clients are adequately protecting their intellectual property.

“Twenty years ago, the attitude was to ‘ask forgiveness, not permission.’ Now, the majority of consuming firms want to understand their usage right so they can stay within the guardrails,” says VendEx CEO Richard Clements. Most firms would rather pay the right fees and know they’re compliant—not just because of the risk of audits, fines and penalties, but also because they don’t want the bad press or to alert other data sources that they may be non-compliant and find themselves subject to audits from all their suppliers.

And if a vendor discovers that a client has extended their usage beyond those guardrails, rather than finding out years later and threatening penalties and audits, they can approach the firm immediately and upsell it with the additional products or usage rights that it needs.

“Our patent and the Tractiv partnership gives us the ability to change the way data is sold and how usage is tracked,” Clements says. “It’s extraordinarily expensive and hard to bring in data, normalize it, and get it to clients. So, we’re protecting vendors’ ability to do that for clients.”

You see where this is going, right? The purpose of audits is to discover usage where no transparency exists. But if you can introduce that transparency in real time, the need for audits goes away. I say again, if both sides are on board, then the need for those intrusive, burdensome, combative, time-consuming—and frankly expensive—audits goes away.

For exchanges like TMX Group, which has already embarked on eliminating or reducing its reliance on audits, this provides the technology toolkit that will help make ambitions a reality.

Of course, this isn’t going to happen overnight. The vendors will start reaching out to mutual clients, and build adoption over time. But still, building critical mass will take time.

“You’re talking about changing behavior that has become entrenched over the last 30 or 40 years,” says Almeida. “Today, you can buy a Tesla that costs more than $100,000 entirely online without having to talk to a human being, but you can’t buy a $35,000 subscription for any data without meeting a sales rep to discuss your options. As an industry, we haven’t figured out a way to sell data in a technology-enabled way.”

But in the meantime, there’s another use case that makes this particularly timely: the challenge of ensuring AI tools and LLMs are using data they’re allowed to use, and that vendors are properly compensated. Indeed, that “fortuitous timing” was a catalyst for the vendors joining forces, Almeida says.

VendEx’s Clements adds that while all the vendors now have AI arms, they need more data, and they need to understand the provenance and usage rights and monitor whether they’re staying within the prescribed boundaries. “We’re seeing data show up in AIs that a company doesn’t have a right to use... so there’s a massive opportunity for VendEx and Tractiv to supply solutions in a way that no one else can.”

I’m sure that before too long, VendEx and Tractiv won’t be the only game in town. But by being the first to test the waters, they’ll be lighting the way for others to follow. Clearly there’s a large market potential for this—both in AI and in traditional data consumption—and others will want a slice of that pie.

If you’d like to share your thoughts on the topics of data tracking or audits, or if you’re already doing this, then please reach out to me at max.bowie@infopro-digital.com

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: http://subscriptions.waterstechnology.com/subscribe

You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.

‘Feature, not a bug’: Bloomberg makes the case for Figi

Bloomberg created the Figi identifier, but ceded all its rights to the Object Management Group 10 years ago. Here, Bloomberg’s Richard Robinson and Steve Meizanis write to dispel what they believe to be misconceptions about Figi and the FDTA.

Where have all the exchange platform providers gone?

The IMD Wrap: Running an exchange is a profitable business. The margins on market data sales alone can be staggering. And since every exchange needs a reliable and efficient exchange technology stack, Max asks why more vendors aren’t diving into this space.

Most read articles loading...

You need to sign in to use this feature. If you don’t have a WatersTechnology account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here