From pharma to finance: Cracking the DNA of data management

Enterprise data management has traditionally addressed any aspect of financial data across an organization. But as investment firms’ portfolios of enterprise data broadens in definition to include other types of non-financial data, EDM projects must also expand to embrace new data types.

As investment firms seek an edge to help them make better and more profitable decisions, they’re delving deeper into the data behind industries, companies, and their products to perform more in-depth analysis. However, in many industries, while detailed data on a company and its business areas may be available, it’s often not available in a format that is easy to integrate with price data, financial data, or reference data.

So for investment firms focused on a specific industry or theme—such as biotech or ESG (environmental, social and governance), for example—there aren’t yet established identifiers or ontologies that allow them to combine the data that they believe will provide extra insights with the functional financial data residing in their enterprise data management systems. And while the efforts of industry associations and the vendor community are giving hope for future data interoperability, this leaves those wanting to take advantage of that data today either plowing time and money into custom data mapping and integration projects, or trying to squeeze the “square pegs” of new data types into the “round holes” of existing technology platforms.

Boston-based fund manager RA Capital invests in public and private companies across the medical, healthcare, and life sciences fields. The $10 billion fund looks at specific factors that could impact healthcare, pharmaceutical, or biotech companies, such as drug names, formulations, and clinical trials. The problem is, that data doesn’t easily translate into numerical values, nor does it come with the kind of identifiers or descriptive reference data that would allow a firm to easily map it to a company name or stock price.

The firm was founded in 2001, and until now had let its research team manage the process of sourcing and correlating data to drive investment decisions manually. As the firm grew its assets under management (today it manages over $10 billion) the volume of data it subscribed to increased. This included news feeds, as well as niche content sets that track relevant events such as drug trials and approvals, and attributes of individual drugs, in addition to company-related data such as information on members of a company’s board of directors.

As a result, the need to invest in a proper enterprise data management system became more transparent to executives at RA Capital. Not all of these datafeeds easily lend themselves to traditional market data platforms, and would commonly be managed in an ad-hoc manner in spreadsheets or an analyst’s own database, says Chris Caliri, chief information officer at RA Capital.

“Our data was becoming unmanaged and uncontrolled because we didn’t have a central repository. We noticed things like company records going stale, or missing data. For example, a corporate action could cause issues and we would have to ask why certain values were wrong. If we were a $1 billion shop, we most likely wouldn’t need an EDM platform; we’d still be doing all this in spreadsheets and siloed databases.” Caliri says.

But even though the firm was spending thousands of dollars on datafeeds, it wasn’t seeing the full benefit because it couldn’t efficiently access that non-financial data and match it to relevant market data.

So, in 2020, the firm began evaluating EDM platform providers, including UK-based data technology vendor Xenomorph. The firm kicked the tires on about a dozen providers before picking Xenomorph. Each could do some of what RA Capital needed, but would either be unable to handle specifics within the data and the mappings between specific drugs and their manufacturers, or be unable to get data in and out of their platforms quickly. Or, it would instead turn into a longer-term consulting project. Caliri says that Xenomorph could demonstrate on the fly how to recreate the firm’s workflows in its data model.

“We wanted to implement a platform like Xenomorph that could manage all data and feed it to users and/or downstream applications. We needed to get away from our manual processes,” he says.

After running a proof-of-concept in late 2020, Xenomorph began officially working with RA Capital in early 2021, and by September rolled out the first phase of its platform. That first project involved understanding the firm’s needs and the data modeling required, connecting to the firm’s data suppliers, matching data between external vendors and internal systems, and helping it generate analysis and reports based on the data, says Naj Alavi, New York-based president of Xenomorph.

Part of the challenge was mapping the key interrelationships between different categories of data defined by RA Capital that all relate to the same drug or manufacturer—not just a company name or drug brand, but separate events, indicators and milestones that relate to all aspects of a drug and its development, says Mark Woodgate, who founded Xenomorph in 1995.

“The complexity of physically storing the data is one thing, but with all of those complex interrelationships, a standard security master just wouldn’t fly. You would need to create a whole new schema,” Woodgate says. Because Xenomorph’s platform is data-agnostic and able to support any type of data, Woodgate says it was able to match and merge data from multiple external vendors each supplying different data elements with the firm’s internal systems to link data to companies and drugs, even for companies that aren’t publicly listed.

“The security master part was fairly straightforward, especially on the publicly traded equities side,” says Ernesto Gonzalez, senior business analysis manager in RA Capital’s IT department. “It gets more challenging to match and merge on the private companies side because there are no identifiers. So, we partnered with Xenomorph to build out rules capable of merging data from different vendors and various types of entities—for example, companies and drugs from two different data providers.”

Xenomorph’s EDM+ platform has been in user acceptance testing at RA Capital during Phase 1 of the implementation, which the firm plans to wrap up imminently. At that point, having onboarded all of the firm’s data, maintaining it becomes an incremental process of adding new companies and drugs as they become of interest to the firm, he says.

“We’re now starting to leverage what we’ve built. Even though we’re not technically live, we’ve had a number of ad-hoc projects where we were able to provide consolidated data from a number of other vendors to the research team, whereas previously they would have been tasked with manually maintaining the spreadsheets that we’re now doing systematically,” Gonzalez says.

“Now we have a central and systematic workflow of merging and matching data—so a lot of that manual work goes away. So, whereas the research team has previously been focused on data gathering, now they can focus more on analysis and validating the results.”

Something else that goes away is the potential for error that exists within manual processes. Caliri says replacing spreadsheets with a proper platform for data management not only reduces potential errors from copying and pasting data in Excel spreadsheets, but also eliminates the challenge of extracting data from a spreadsheet once it’s been entered into one.

Even then, there was still a lot of manual effort involved in setting up the platform.

“Because of the sheer volume of data we ingest, it was a big effort,” Gonzalez says. “Xenomorph automates a lot of the matching and merging, but there’s still a lot of manual reviews that needs to happen.”

Defying definition

RA Capital isn’t alone. Other firms attempting to use granular and non-traditional data to fuel investment models are encountering similar challenges. Xenomorph’s data-agnostic approach allows the vendor to easily onboard new data types—for example, the vendor has already run a proof-of-concept using wind turbine data for clients in energy markets. Yet, despite the complexities involved, some prefer to tackle these challenges in-house.

A good example is ESG data—a rich tapestry of information spanning myriad topics and data points, much of which defies the tabular and numerical formats of traditional financial data. Not only is the relevant information more likely to be textual and require translation into any numerical value, rating, or score, but each firm may place a different emphasis on each.

RadiantESG, a San Francisco Bay Area-based investment fund, focuses on using ESG factors and models to identify companies that represent good investments and that make a meaningful effort to incorporate ESG into their day-to-day business.

“We try to combine fundamental and ESG factors,” says Mauricio Bustos, head of data and technology at RadiantESG. “So, we’re looking for companies with a solid financial foundation, but on top of that, companies that demonstrate a credible intent to push forward in the three areas of ESG.”

However, while there’s plenty of ESG data available in the market, obtaining data from its original source in any kind of standard format can prove challenging for firms who see value in performing collection and analysis in-house on data including financial filings, lawsuits affecting a company, corporate and social responsibility reports, data from aggregators, and information from NGOs about controversial industries or companies.

Bustos says that regulators do not require companies to produce this data in a standardized format, which leads to firms needing to rely on third-party vendors, NGOs and public sources to do that. Firms also need them to extract information from, for example, 10-K and 10-Q filings about what a company is doing and to detect a company’s credible intent to contribute to these three pillars. Yet another issue is that ESG data doesn’t adhere to traditional delivery formats in the same way as other market data such as prices or corporate actions.

“It doesn’t come in a firehose, but rather trickles in,” Bustos says. “So, as it does, you have to build up a full picture of a company, like building a mosaic.”

That said, while corporate and social responsibility reports aren’t required in equities markets, plenty of companies already report that information voluntarily, so Bustos says it has been fairly easy to map existing data to its security master. RadiantESG made the strategic decision to run its data management in-house.

“Reference data management is a very important element that we need to own completely in order to have confidence that our model is working the way we expect it to,” Bustos says. “To make sure we’re talking about the same company, reference data integration—and making sure that we’re integrating that data in a consistent way—is a critical component.”

Normally, a firm would rely on an off-the-shelf EDM platform to piece together that mosaic of data and map it to market and reference data about a company or instrument. However, RadiantESG built its own, combining cloud technologies for storage with natural-language processing tools to aid with details of text extraction, such as for entity identification, sentiment detection, and keyword evaluation.

Bustos notes that the group that built RadiantESG came from quant shop Rosenberg Equities, which was part of Axa Investment Management. At that firm, he says they built everything from scratch.

“The cloud has been a huge benefit in terms of managing these large datasets. Because there are no standard formats, data needs to be extracted from free-form text—and that text takes up a lot of space, because anything we want to extract data from needs to be stored somewhere,” he says. “So, being able to rely on the cloud to store huge amounts of data has been very beneficial and affordable.”

RadiantESG is the exception to the rule in building its own platform to manage its data assets, but not in terms of using custom tools for handling custom datasets. The complexity of bending data models to accommodate unusual data types is something that most firms and vendors have trouble with, says Ethan Shen, CEO of Crizit, whose Periscope platform monitors enterprise data licenses and usage.

“Where we’ve seen non-traditional data sources, they are typically consumed by bespoke internal applications. Most finance deployments of EDM platforms are built around a data model that represents companies, tradable instruments, and events. Once you stray outside of the standard models, customizing large EDM platforms typically is much more painful than just building a custom application,” Shen says. “For the EDM platforms typically used in finance, the models are quite rigid. Customizing is possible, but often prohibitively slow and difficult.”

But as the use of non-traditional datasets widens, managing them in EDM systems—and monitoring usage and cost—will become essential. TRG Screen, for example, already supports the ability to define different data types in its Optimize Spend data inventory and spend management platform.

Richard Mundell, chief product officer at TRG Screen, says the vendor has customers tracking market data-adjacent categories such as research by brokers and other third parties, as well as more niche publications. All these information sources tend to fall to the market data teams to manage. Through Optimize Spend, clients can, for example, create their own user-defined fields and use them to track individual data elements that comprise ESG information.

“Bear in mind that they’re not storing the actual data in Optimize Spend, but instead they’re tracking the necessary information to track the subscription—so that’s all the details of the vendor, the contractual terms, the services covered under that contract, how those services are consumed by the firm (including platform and delivery mechanism, if applicable), and onward from there, the inventory of who has what and the cost allocations,” Mundell says.

TRG Screen needed to build extra features into Optimize Spend to account for the nuances of specific non-traditional datasets. For example, Mundell says, firms using expert networks such as Gerson Lehrman or AlphaSights tend to pre-pay for a set number of meetings and consume that balance over the duration of the agreement. Clients can define their pre-paid amounts, and the system will track and report their remaining balances. Mundell says that the company will add similar capabilities as clients expand their use of the service to new expense categories.

Standards to the rescue

But the fact remains that without standard definitions and identifiers for data types that don’t yet have a formal structure, managing them will be up to bespoke in-house implementations and costly consulting projects that may solve one firm’s issues but can’t be scaled industry-wide to gain greater efficiencies.

However, standards could soon be on their way, thanks to efforts by the EDM Council, a reference data industry association, which—as part of its effort to expand its reach outside of the financial arena—is developing semantic ontologies for other industries that could create standards to also benefit investors monitoring those industries and companies within them.

Around four months ago, the EDMC began a partnership with major pharmaceutical industry players, including Bayer, Merck, Roche, GSK, Johnson & Johnson, and others, to develop a pharma-specific ontology, dubbed IDMP (Identification of Medical Products), which identifies everything from individual substances to units of measurement and maps them together. The primary driver for this effort is to make it easier for consumers to identify the same drug in different countries and jurisdictions, which may have different names or different formulations. However, it could also make it easier for companies like RA Capital to identify and compare the offerings of different biotech and pharmaceutical manufacturers and provide a more structured basis for apples-to-apples financial analysis where that hasn’t previously been possible, or has only been possible by performing custom projects.

“It’s a mess in the pharma industry, and that mess is also what we’re being approached about for other industries, like ESG and agriculture. The EDM Council is getting involved in creating ontologies for these as well,” says Mike Meriton, co-founder and COO of the EDMC. “For example, we’ve formed a 100-company working group on ESG data, looking at how companies report, how vendors get that data and publish it, how consuming investment companies use it to make decisions, and how standards bodies standardize it.”

As a result, EDMC is now being asked to apply its expertise to create a climate ontology, Meriton says, so that information traditionally stored in different databases and referred to using different terminologies in each can now be harmonized so that a standard identifier can be mapped to the same data in each usage, allowing firms to aggregate separately stored data without having to consolidate it into a single, new database.

“So, whether you’re an investment analyst doing drug discovery or a pharmacist helping someone find the right drug, an ontology… allows you to identify, correlate, and match data at scale, whereas today that uses multiple different standards,” Meriton says.

Once standards are structured in a way that they can be applied to any company, product, or industry, then data on all those elements will have the same attributes as other, structured types of financial, market, and reference data, and firms will be able to more easily manage any potential data input alongside—and seamlessly integrated with—the data they use to make trading decisions, without the need for major, custom integration projects.

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: http://subscriptions.waterstechnology.com/subscribe

You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.

‘Feature, not a bug’: Bloomberg makes the case for Figi

Bloomberg created the Figi identifier, but ceded all its rights to the Object Management Group 10 years ago. Here, Bloomberg’s Richard Robinson and Steve Meizanis write to dispel what they believe to be misconceptions about Figi and the FDTA.

Where have all the exchange platform providers gone?

The IMD Wrap: Running an exchange is a profitable business. The margins on market data sales alone can be staggering. And since every exchange needs a reliable and efficient exchange technology stack, Max asks why more vendors aren’t diving into this space.

Most read articles loading...

You need to sign in to use this feature. If you don’t have a WatersTechnology account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here