The Dark Art of Pre-Trade Analytics

Financial firms commonly review trading activity after the fact to improve their execution strategies. But what they’d really love to do is perform that in real time, pre-trade. Max Bowie looks at how far along market participants are in pursuit of this goal, and the significant challenges to achieving it.

While there are a plethora of tools claiming to perform pre-trade analysis of trades, many simply factor the proposed trade into firms’ preset risk parameters that determine how much of an asset a firm is comfortable holding. Actually delivering a pre-trade, real-time analysis tool—either for predicting market impact or measuring transaction cost analysis (TCA)—that provides an indicator as accurate as a trader’s “gut feel” is harder and more resource-intensive than one might first think.

gerrit-van-wingerden-tora

“Anyone who claims to have super-accurate pre-trade TCA on a per-order basis is lying to you. Unless it’s one extreme or another—i.e. a massive buy or sell order that will almost certainly drive the price up or down—it’s very hard to predict because you can’t tell who else is in the market at the same time. You could be buying a stock, but someone else could be selling twice as much of the same stock, more aggressively,” says Gerrit Van Wingerden, managing director of Japan at order and execution management system vendor Tora.

Perhaps the greatest challenge facing someone aiming to predict how a market will react to a trade is grasping the characteristics of how that market behaves—and to achieve that, firms need a simulator with the ability to act like marketplaces themselves, and which contains the entire history of quote and trade data from each market to accurately reflect its trading activity.

These simulators may have originally been conceived as testing environments to demonstrate the robustness and functionality of new applications designed for use with exchange data and trading platforms, but have evolved into engines for gauging strategy performance.

“You might want an exchange simulator for user acceptance testing of anything that talks to an exchange, such as for regression testing around changes—so while that wouldn’t tell you about the profitability of your strategy, it does tell you that everything’s working,” says Mark Skalabrin, CEO of ticker plant and feed handler provider Redline Trading Solutions, which has offered its Mars market simulator for several years for regression testing purposes. “The next use case would be to tell you how well an algorithm is working—for example, to test whether an order would have been filled by just trying to trade against the best bid or offer… which is fine for orders where your strategy won’t move the market.”

Whereas this common usage relies on real historical prices and doesn’t attempt to factor the orders into the historical data to measure market impact, Skalabrin says a more complicated use case—and one that the vendor is now evaluating with clients—is to simulate individual exchange matching engines and allow users to post “orders” to this simulated engine, and see those orders show up in the “market data” generated by the fake matching engine, creating a synthetic reality that would show how their trades affected the market data stream.

shimrit-or-velocimetrics

“From what I’ve seen, people are in the infancy stages of addressing this problem. Most firms don’t even have an accurate replay of what actually happened, let alone how the market might react,” says Shimrit Or, senior professional services consultant at UK-based performance monitoring and analysis technology vendor Velocimetrics, whose mdPlay tool provides data capture and analytics. The vendor originally developed the solution to replicate problematic events from trading systems within a test environment, though Or says she has seen clients use it for many different purposes, including for testing algorithms, smart order routers, and vendor applications such as feed handlers.

“Most firms are still at the stage of looking at whether their data is complete—i.e. doesn’t have gaps. They’re not even at the stage of looking at the data quality itself,” Or says, adding that this involves not just an accurate replica of live data from the markets, but accurate timestamps and clock synchronization.

victor-yodaiken-fsmlabs-2015

This precision timing data is a key component of data that helps firms gain a deeper and more accurate understanding of their trading performance. “Without good timing information, you have no ability to understand what’s going on in your trading system. You have no way of knowing if someone is front-running you if you don’t have good timing information. The ability to tell what happened depends on your clocks being accurate,” says Victor Yodaiken, CEO of timing technology vendor FSMLabs. “Firms may have multiple gateways to different—or the same—trading party, and at some points in the day, some may be busier than others. And there’s no way to know that during the trading day. You may even see something that looks like a problem, but is actually a result of your clocks being off…. In the past, companies have told us that they’ve been able to tweak their algos because their clocks were better.”

Data Quality is Key

While the specifics required to build an accurate analytics tool vary depending on the asset class being served, but all share a basic fundamental requirement: quality, granular data. In particular, any analytic designed to guide a trading algorithm must have access to a significant amount of historical data to gauge the markets’ historical reaction to past events. 

kevin-shea-disciplined-alpha

“From a quantitative perspective, we want to answer the question of what does something mean over time, and whether it adds to or detracts from our performance,” says Kevin Shea, CEO of Boston-based registered investment advisor Disciplined Alpha. “We can’t look at something that just happened one time. We’re not a black box with a hundred factors changing from one month to the next… so we want data going back a long time.”

That’s where established brokers and technology providers can bring their resources to bear. “We handle about 11 percent of institutional volume in Japan. As a result, we have a very large database of historical order data. We have assumptions about what impact an order will have on the market, and we test those. We can do simple regression testing, or more recently, we’ve been using artificial intelligence and machine learning,” says Tora’s Van Wingerden. “We have training and test datasets, and from there we can see how good our models are, by measuring slippage against the mid price, and you can come up with what on average is a fairly good measure. We look at liquidity consumption, volatility, and spread [as inputs to our pre-trade TCA]. We also take into account the broker-specific algorithm being used. There can be significant differences based on the broker algorithm—for example, some do better when executing small-volume trades, or in lower-volatility names.”

Denver, Colo.-based data, analytics and trading technology provider CQG tells a similar story: “CQG has been around awhile, so we have a wealth of historical data. We have back-testing functions, and have tools that allow people to write strategies, back-test them, plug in costs, and see if they will make money. We’ve had that ability for years,” says John Arvanites, CTO at CQG, whose latest initiative is applying those tools to third-party algorithms to measure TCA and support execution of block trades without moving the market. 

“We’ve had conversations around pre-trade TCA, but haven’t implemented that yet. It’s difficult because of the data, and also because of the algos themselves, because you need to know how an algo will react. So we can do that for our own model, but not for external algos…. It’s partly to do with IP—there’s a trust issue, where algo providers don’t want to share the IP of their strategies,” Arvanites adds. “We’ve had some requests to see ‘Once my order was fully filled, what is the outcome—for example, was I too aggressive?’ So we are tweaking some of our TCAs to give more of that kind of information. We can overlay any of our TCAs over any of our studies or any market activity.”

Though CQG’s efforts may be pushing the TCA envelope, Arvanites says basic data quality is key. “The first priority is quality of data. We have mechanisms in place to scrub and correct data and make sure it’s accurate. Sometimes we get trades, bids or asks that are completely out of line, or simply didn’t happen. So pulling out a trade, and going back through the historical data and pulling it out of there, then correcting the data and pushing it out worldwide… it’s not rocket science, but having the systems in place to correct it after the fact is important.”

Those systems for collecting, cleaning and processing don’t appear overnight. Behind any successful analytical or TCA tool sits a significant amount of infrastructure. And that doesn’t come cheap or easy.

guy-cirillo-quantitative-brokers

“There are high barriers to entry. It’s hard to justify just building a TCA product. But if you can build the trading algorithms, that puts you 80 or 90 percent of the way there. For example, we have the exchange links, so everything is in real time. If you already have that… it’s easier to layer a TCA component on top,” says Guy Cirillo, head of partnership sales at broker and technology provider Quantitative Brokers in New York, which is also planning its own pre-trade TCA product. 

“You need a lot of storage, you need historical data, real-time data, and software to run the analysis. So code has to be written, and you need servers in proximity to where the executions are taking place,” Cirillo adds. “From a Quantitative Brokers point of view, the data was already there because we had the algorithms. Once that was in place for the execution component of the business, building models was the next challenge. That’s where we feel we have an advantage. If you have people who can build trade execution models—i.e., the algorithms—you can build software that analyzes them.”

One factor that makes these analytics so costly—in addition to the infrastructure required—is the amount of data flowing across that infrastructure that must be processed. Equity options in particular present a challenge because of the number of strikes compared to underlying equities. 

“Not all of those strikes actively trade, but the sheer volume of market data does add a level of complexity. In the US, we’re now up to 15 options exchanges… and to get a complete picture of the market, you have to subscribe to data from all of them. Paying for that, bringing it into your organization, and reading that data all adds to the angst and cost,” says Tom Lehrkinder, senior analyst at Tabb Group

tom-lehrkinder-tabb-group

“If you start getting into more complex orders, such as multi-legged trades, it becomes a substantial project to get up and running in terms of time and money, when there are shops available that make this almost plug-and-play,” Lehrkinder adds. “There are a couple of schools of thought: Build-your-own is really, really difficult. Some of the big market makers and high-frequency traders have the resources and wherewithal to build their own. Then you get a hybrid approach at smaller firms who’ll buy something from a vendor and may supplement that with something they’ve acquired or developed themselves. And then there are firms who only use vendor products.” 

Certainly those at the sharp end of the debate are picking build versus buy. “Most of the time, if you want something unusual, you have to build it yourself. It’s rare to be able to buy all the features you might want. It seems like everybody out there has much the same thing,” says Harindra de Silva, president and portfolio manager at Los Angeles-based investment manager Analytic Investors.

mark-skalabrin-redline-trading-solutions-2016

Redline’s Skalabrin agrees, despite offering a solution of its own for this purpose. “Everybody builds this themselves for the most sophisticated use cases. Banks spend a lot of time building market simulators and building intelligence into them,” he adds.

In Quantitative Brokers’ case, it built the tools itself as a way to drive clients to its trading services. “We had to build our own TCA tool and our own market simulator because when you go to a client and say, ‘This algorithm will perform better than what you’re currently using, even including the fees we will charge,” you need to be able to prove that,” Cirillo says.

Problem Data and Human Factors

So for many well-established datasets, even outside vanilla equities, obtaining the data isn’t hard. Even more nuanced data associated with exchanges and other trading venues, such as dark pools, that can be harder to find but go further towards building a full picture of the markets, are becoming easier to obtain. For example, Analtyic Investors’ de Silva says trading venues are becoming better at sharing information such as historical volumes and strategy decay—factors that would enable traders to pick their destination venues on a day-to-day basis.

But firms wanting to incorporate alternative datasets into their strategies face a bigger challenge in that the data may simply not exist, or have only been collected for a limited time. 

michael-raines-wall-street-horizon

“In the past, firms might have asked vendors for 10 or 20 years of data. But new datasets might have two years of data,” says Michael Raines, director of quantitative data solutions at event data provider Wall Street Horizon. “Say we add a new dataset and clients ask how much history we have. I say ‘It’s been ready for six months, so I have six months of history,’ because we won’t back-fill data. So if we have a hole, then we have a hole, and if someone else claims to have that, I can’t necessarily trust how they got it—it needs to be self-sourced or primary-sourced.”

Others note that for newer datasets or parameters with less available history, there may not be enough examples of how something impacts the market to be statistically significant.

One area where little data exists to incorporate into pre-trade analytics—and far harder than testing whether an order would have executed historically, or how an exchange matching engine might respond in a test environment—is the unknown human factor: how another trader or competing algorithm will respond to an order. Evan Schnidman, founder and CEO of Prattle, a technology company that uses sentiment analysis to predict the market impact of central bank and corporate communications, suggests two ways—neither perfect—to achieve this.

The easiest approach would be to simply poll traders about how they would respond in specific market conditions, though Schnidman notes that this is less scientifically rigorous because people may not necessarily answer truthfully. The optimal way to model trader reactions would be to set up “a true simulation using real money so that people would not take undue risk,” and conducted in a sufficiently large simulated market to deliver statistically significant results, yet self-contained so that those running it can test specific scenarios, or test how different individuals and collective user profiles react when emotional factors such as panic or exuberance are introduced into the mix.

evan-schnidman-prattle

“But no one has ever successfully simulated all the dimensions of a stock market,” he says. “It would take tens or even hundreds of millions of dollars to get the level of information you’d need [from a simulation]. Firms may already have this data, but are not likely to share it. The Securities and Exchange Commission may have it, possibly anonymized. Or you could use 13-F [holdings] filings from the previous quarter if you know which prime broker funds use. It’s not easy to find, and that information would give you data for the trailing quarter at best.”

Even then, how traders react would likely be a best guess based on how the collective herd moves in a given situation. 

“The behavioral stuff is tough, because people react differently. For example, what’s the lifecycle of an algo before it gets detected? Hours? Days? If you get lucky, a week or two? So any arbitrage opportunities are quickly swallowed, and someone quicker will take over that spot. There’s still a lot of the human factor, and I think it will be a while before that goes away, and I don’t know how you would use machine learning to model how people think and react,” says CQG’s Arvanites.

Here and Now

For some, existing pre-trade analytics deliver all they need. “Now, pre-trade analytics have evolved to include optimization tools that enable you to build a ‘trade list’ and see how it looks from a pre-trade perspective, to look at where the volume and liquidity are in a stock so you can route orders to the right venues, and what are the ‘tilts’ in the list,”—those stocks that “tilt” a portfolio to outperform a benchmark—says Analytic Investors’ de Silva.

And especially beyond the equities market, there’s a long way to go in terms of what people could build, but also how umuch demand there is from market participants for more sophisticated pre-trade tools. You would think firms would be clamoring for anything that delivers additional insight, but outside of the more advanced firms, there seems to be a steep learning curve ahead.

“This area is old-school for the equities markets, but in futures, options and other asset classes, this is still in the early stages. We’re out there educating the market,” Cirillo says. “You have to make sure you have a clean source of data… to do TCA in real time during the day. But there is a cost associated with that…. And is that something clients really want or need when we’re still educating clients on TCA? Given that, the tools we’ve created are more than sophisticated enough to meet their needs.”

Inevitably, clients will become more sophisticated, and tools will become readily available. “As time goes on, things that are not practical today will soon become practical. Costs are constantly dropping. A trader needs to understand what he’s going to get in per-tick revenue against what he pays in exchange and clearing fees, and in taxes—so he doesn’t build anything without taking those into account. Newbies don’t, and they quickly run out of cash,” Arvanites says. 

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: http://subscriptions.waterstechnology.com/subscribe

You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.

‘Feature, not a bug’: Bloomberg makes the case for Figi

Bloomberg created the Figi identifier, but ceded all its rights to the Object Management Group 10 years ago. Here, Bloomberg’s Richard Robinson and Steve Meizanis write to dispel what they believe to be misconceptions about Figi and the FDTA.

Where have all the exchange platform providers gone?

The IMD Wrap: Running an exchange is a profitable business. The margins on market data sales alone can be staggering. And since every exchange needs a reliable and efficient exchange technology stack, Max asks why more vendors aren’t diving into this space.

Most read articles loading...

You need to sign in to use this feature. If you don’t have a WatersTechnology account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here