Man Group’s proprietary data platform is a timesaver for quants

The investment firm’s head of data delves into its alt data strategy and use of AI tools to boost quant efficiency.

In elite sports, gaining even the smallest advantage can mean the difference between winning and losing. When Dave Brailsford, the mastermind behind British Cycling’s success in the 2010s, came in to transform the team, he proposed a strategy known as “the aggregation of marginal gains”. He hypothesized that if everything that goes into riding a bike was improved by 1%, the cumulative increase would be massive.  

Improvements such as better cycling equipment were obvious, but smaller things—like hiring doctors to teach the cyclists the best way to wash their hands to reduce their chances of getting sick, finding the best massage gel to aid muscle recovery, and determining the type of pillow that led to the best night’s sleep—all worked to improve the team’s fortunes. Between 2007 and 2017, under Brailsford’s watchful eye, British cyclists won 178 world championship and 66 Olympic or Paralympic gold medals.   

The marginal gains theory is also an integral part of staying competitive as a hedge fund. Every piece of data that a hedge fund has that its competitors do not constitutes a possible advantage, but useful datasets are often adopted by multiple buy-side firms, leading to alpha decay and, subsequently, a decline in their usefulness.

Hinesh Kalian, head of data at Man Group, understands the value of the marginal gains that alternative data investment provides. 

Around five years ago, Man Group began building out a data science function by designing a central platform to ingest all types of data for end-users and improve the efficiency of its quants. For Kalian, this meant the platform needed to be able to ingest, transform, and store data. While this could be done via cloud, the hedge fund elected to do most of the build on-premises, with assistance from cloud technologies wherever it saw fit.

The first step in the project involved optimizing the team to ensure specialists were in their element, rather than operating as roving jacks of all trades.

“To start off, you needed to move away from having generalists owning and managing data,” says Kalian, who joined Man Group in 2010. “Previously, you could have a technologist that’s doing a bit of coding, a bit of engineering, but also owning some parts of the data estate.” 

He explains that the personnel changes were not limited to data engineers—they also involved recruiting specialist roles, such as data scientists, data managers, and data scouts. Once the team was assembled, Kalian and Man Group set their sights on transforming large quantities of data into usable material for their quants.

Despite the name, data transformation isn’t necessarily a complicated process. It can be as simple as changing date and time formats, aggregating fields, or changing a string to an integer. While such examples are simple tasks, they are laborious when performed at scale.

“Along the way, you’re going to get a shopping list of transformations that are generally common and that anyone will need to do on raw data. To make that efficient, scalable, and quick, you want to almost preempt what the end-user is going to require when they see that raw data,” Kalian says.

Man Group’s resulting central data platform is hugely impactful on the efficiency of its quants. The previously daunting tasks of selecting datasets to clean, compile, and analyze became easier when data became readily accessible and categorized internally. 

“It’s like a search engine for datasets that we have internally for the firm,” Kalian explains. He says it plugs into an external catalog, through which the data is categorized and curated by a data management team. This means end-users can query individual datasets by category and determine whether they have permission to access the data. 

He describes it as the entry point to go and find data for anyone at the firm. If a user wants a particular dataset, they can submit a request to the data science team at the click of a button.

Alternative ideas

Alternative data—datasets collected from non-traditional sources—has been championed by hedge funds and asset managers as a means to give them an edge over competitors. Some alt datasets, such as those scraped from the web or consumer transaction data, can offer both long histories—and, therefore, more data for future predictions—and be niche enough that integrating them into the investment process provides an advantage.

Kalian says that over the past eight years, the alternative data vendor space has matured as vendors learned to assess the value of their datasets for potential investors. Investors became more discerning about their choices of datasets, and the vendor landscape shifted to provide datasets that were less esoteric and offered more consistent results. Man Group crafted its approach to alternative data in a similar way.  

“When we started off looking at alternative data, we couldn’t just go out there with the view of, ‘Let’s find everything that’s out there, and let’s try and bring it in and figure out what to do with it,’” he says. “We asked ourselves, ‘What does it mean to ingest unstructured data? What skillsets, talent, and technology stack do you need to do this and how can we do this at scale? How relevant is it to what we trade and invest in for our clients?’ Then we built out a picture that way.”  

Bill Dague, vice president and head of data product at Nasdaq, agrees that the alternative data market has matured as buy-side firms have come to appreciate its value. Dague spearheaded Nasdaq’s acquisition of Canadian alt data provider Quandl in 2018, and since then has helped build out the stock exchange’s alternative data offering. Dague says that as alternative data has been recognized as a valuable way of finding alpha, incorporating it into investment portfolios has become non-negotiable.  

“A client said to me yesterday, ‘Amateur hour is over in the data game,’ and we need to really make sure that we’re ready for the next phase,” Dague says. “Alt data isn’t alt anymore; it’s table stakes for most folks.”  

Kalian notes that Man Group looks at more than 200 datasets per year, but only a few get passed to an engineer. 

Dague points out that this is a common feature of alternative data use. While the market has experienced some consolidation in recent years, there are still plenty of opportunities for information overload

“It’s not like everything [in alternative data] is relevant and useful. In fact, I think what we learned as an industry over the last few years is that most things are irrelevant and not useful,” says Dague, adding that firms investing in alt data must be “agile” about it and have a process that allows efficiency in ingesting and interpreting data sources. 

When WatersTechnology spoke to asset manager Vanguard about its Quantitative Equity Group’s strategy for selecting alternative datasets, QEG head John Ameriks explained that the group “didn’t pay attention to information that has a very, very short half-life in and of itself,” and instead prioritized datasets with longer histories. 

Man Group’s Kalian, however, disagrees with this approach.

“I’ve got quite a contrarian view on this,” he says. “While I agree that you want datasets that have a long history and a proven record of generating some sort of consistency, I also think that you are following the crowd there. The way I see alternative data in a holistic view is that it’s more about what I call ‘information arbitrage’. You want to get a timely read on information related to the financial environment. To do that, you can’t look at one dataset or those that had a proven record over multiple years in isolation. You need to bring a variety of different data sources together to give you some indication of what’s going on to forecast a tradeable asset or the macro environment.”

The god in the machine 

When selecting alternative datasets to onboard, Man Group considered how much extra quantitative work goes into cleaning, sorting, and identifying the information for internal use. To minimize time-wasting, the hedge fund identified some tech-based solutions that could help their quants and project managers spend less time on grunt work and more time on actionable tasks. 

James Maxfield, head of product and solutions at Duco, a no-code data reconciliation company that counts Man Group as a client, says the lack of a standardized format for the alternative datasets the hedge fund acquires can stymie efficiency.  

“At the moment, for a customer like Man Group, to extract that data out of a PDF or [other] unstructured data [sources] is quite a human-intensive process,” Maxfield explains. “Someone’s probably got to go into that email, read it, find the bits of data they want in the PDF, cut and paste it, put it somewhere else and make it standardized, and then feed it into whatever process they’ve got.” 

Maxfield says emails containing data in a non-tabular format, such as free text, are a problem for Duco’s customers because a human must carry out a lot of menial tasks just to make the dataset usable, rather than making something great out of it.  

In addition to employing external help for its dataset selection process, Man Group has developed its own internal solutions. The hedge fund has been using some form of machine learning or AI for more than a decade, and last year announced a proprietary version of ChatGPT, dubbed ManGPT, for internal use. The firm deploys AI in its execution models to determine how to perform smart-order routing, as well as to assemble discretionary datasets that help the company’s portfolio managers.  

Kalian says one use case for the technology is summarizing documents for Man Group’s portfolio managers, who typically must analyze large stacks of documents manually before drawing an investment thesis.

“It’s enabling a portfolio manager to get a very quick and holistic set of information to make an investment decision,” Kalian says. “That could be a summarization of information, yes, but that could be helping a portfolio manager understand the impact of their last trade to determine a behavioral bias, or a pattern we’ve seen historically with them, or whether they are moving into an environment that may be changing because the data is saying something different, or whether we are at an inflection point. This can be more of a learning-based mechanism to give them a pre-warning to decide what they should do in this environment.” 

He explains that the team took care to choose the right implementation of AI that works best for the company. “We need to be careful about creating large neural networks that are so complex that it’s difficult to decipher and understand what’s happening under the hood.”

Reflecting on last year’s boom in popularity for generative AI, Kalian says the biggest upshot was that it has put useful AI tools and large language models into the hands of non-AI experts to experiment freely.  

One of the AI-driven use cases Man Group is working on internally is the ability to quickly summarize actionable features from datasets for further quantitative analysis. Kalian explains that features are basic kinds of signal inferred from datasets, and quants research these features daily to create larger models for use in portfolio construction. By passing the bulk of the effort along to Man Group’s AI-based ‘alpha assistant’, the program can create primitive versions of these models itself.

Within ManGPT, users can ask the assistant to build a basic quintile portfolio based on data features that target certain stocks’ key performance indicators, and it will return the risk profile and historical back-test results from a basic point of view. ManGPT has access to all the underlying data, and it’s curated in a form that can quickly give cursory, fundamental feedback to the portfolio researcher, data scientist or the quant researcher. 

“That’s like a pre-initial screening pathway to generate new alpha ideas,” Kalian says. “They go, ‘Hmm, that looks good. I should dig into that a bit more.’” 

Kalian hopes to see more from the alternative data vendor community going forward. But he believes one challenge for the alternative data space is the “lack of innovation” outside of traditional equities.

“Vendors thinking a bit more about how you can piece up different elements within a different asset class would be more interesting, instead of just trying to forecast a company’s revenue.”

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: http://subscriptions.waterstechnology.com/subscribe

You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.

‘Feature, not a bug’: Bloomberg makes the case for Figi

Bloomberg created the Figi identifier, but ceded all its rights to the Object Management Group 10 years ago. Here, Bloomberg’s Richard Robinson and Steve Meizanis write to dispel what they believe to be misconceptions about Figi and the FDTA.

Where have all the exchange platform providers gone?

The IMD Wrap: Running an exchange is a profitable business. The margins on market data sales alone can be staggering. And since every exchange needs a reliable and efficient exchange technology stack, Max asks why more vendors aren’t diving into this space.

Most read articles loading...

You need to sign in to use this feature. If you don’t have a WatersTechnology account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here