The ABCs of NLP: How Trading Firms Used NLP to Navigate 2020

This year, natural language processing came to the fore in capital markets, helping firms of all kinds parse huge, unstructured datasets.

This year, investors discovered that the fancy algorithms they were using to drive alpha, which used long histories of data to build understanding of markets, failed spectacularly in the extreme context of the pandemic, forcing a reassessment of these models.

Desperate times call for new ways of understanding the world, and many asset managers and banks turned to alternative data—everything and anything from air quality indexes to traffic jam counts to social media posts, and of course alternative forms of healthcare data—in an attempt to better understand the volatility roiling the market. Much of that data (tweets, earnings call transcripts, regulatory documents) is text-based and unstructured and, as a result, natural language processing (NLP) came into its own in the capital markets.

Trading firms have long been excited by the technology, if not always fast to adopt it for technical and regulatory reasons, but many banks and asset managers are now not only using NLP algorithms to trawl for insights into the pandemic’s impact on companies, but also for more quotidian tasks, such as improving the experience of customers interacting with their chatbots, interpreting internal documents, and analyzing companies in investment portfolios.

Vendors and data providers catering to the needs of these firms have bolstered their NLP offerings for a range of needs. As Amanda Stent, NLP architect at Bloomberg put it, what was once a previously academic discipline has become a proposition for product development, especially with the evolution of the field represented by BERT, Google’s transformer model, which the tech company released in 2018.

“The field is bigger than it used to be, so when a revolution like [transformer models] happens, all of a sudden there are many hungry young researchers ready to do the exploiting. That is what we are seeing now,” Stent said. “In the early 2000s, there were really only two dozen people who could exploit a release, but now there are maybe 20,000.”

In 2020, there were more than 80 stories mentioning NLP published on WatersTechnology—that’s more than half the total number of stories ever published on our site that mention the technology. Below is a look at some of the bigger NLP-focused projects that we wrote about this year.

Bloomberg, Refinitiv, and BERT

Refinitiv added a Covid-19 News Tracker app to its Macro Vitals offering to help firms find actionable information on specific companies and how they were affected by the pandemic. The tracker uses machine learning to ingest and filter news articles, and classify reported events as “risks”, “opportunities”, or “neither” for specific companies.

This tool employs BERT—or Bidirectional Encoder Representations from Transformers—a game-changer for NLP that Refinitiv and Bloomberg, among others, have since its release in 2018 been working on adapting to financial services use cases.  

The power of BERT is that it reads a sentence bi-directionally, supplying a lot more context about the text to the model, making its interpretation stronger.

Companies like Bloomberg and Refinitiv have access to massive amounts of data, and have been training BERT on that data to apply it to use cases for client solutions.

Refinitiv has been looking into selling news feeds to customers wanting to get an edge, whether in pre- or post-trade processes. The data giant’s Innovation Lab built a transformer model based on BERT, and—crucially—trained with financial data, that has been adapted for use throughout the business.

Refinitiv ingests huge amounts of news data for its Intelligent Tagging platform, which derives meaning from unstructured data, such as news articles going back to 1996, and processes millions of these document sources daily, making information available to customers on the Eikon terminal.

Refinitiv’s World Check team, which collects information about financial risks of individuals, is also using it. And in Eikon, the model helps return the most relevant articles possible to users who search for news about companies in the terminal’s Investor Briefs.

At Bloomberg, similarly, BERT is helping Terminal users access the most relevant financial information. Bloomberg takes in a huge number of articles every day and clusters them by topic or category, and then again by event. Relevant stories appear to Bloomberg Terminal users grouped under an automatically-generated headline that the transformer itself has produced.

The model is further trained by Bloomberg journalists, who can use “thumbs up” and “thumbs down” icons—like the same buttons on streaming music service Pandora, which the company uses to tailor music channels more accurately to users’ tastes—to teach the model what humans consider to be good headlines. A bad headline gets a thumbs down, a good one a thumbs up.

Bloomberg is using the solution in other areas too, including helping customer service staff to answer user queries.

The data giant also developed a new search functionality that is embedded into its Trade Order Management (Toms) Trade Analyzer. This new question-answer interface is, at its core, an engine that heavily leverages NLP and machine learning algorithms to deliver answers to questions that are unique to the needs of traders. It aims to reduce the number of clicks needed to search for and within certain datasets, such as querying trade histories with a buy-side customer or looking up missed trades during a given time period.

JP Morgan Asset Management

JPMAM looked to build out its NLP tool, Textual Analysis, to read Chinese and Japanese documents.

Textual Analysis, which went live in November 2019, is used by the firm’s portfolio managers in both its quant and fundamental investment teams to read millions of documents, ranging from company filings and corporate event transcripts, to employee reviews on sites like Glassdoor.

For quant teams, Textual Analysis generates a raw signal, which has demonstrated strong investment returns, according the firm. The investment teams can then interact with the front end of the tool to better understand what is driving that score. For fundamental analysts, the offering creates a dashboard they can use as a screening tool, and allows them to do a deep dive into a company’s holdings in a portfolio.

Textual Analysis relies heavily on BERT, which gives the model context to the text it reads in documents. JPMAM added features to make BERT suitable for application in finance.

Ping An

Ping An Insurance Company of China is arguably one of the leading companies—regardless of industry—when it comes this NLP development.

Ping An is the country’s largest insurance firm. Its platform, Omni-Sinitic, is based on semantic technologies, including NLP, knowledge graphs, and robot testing. Omni-Sinitic scored higher than tech leaders like Microsoft and Google in benchmarks that evaluate systems on their level of natural language understanding.

Jing Xiao, chief scientist at Ping An, said the firm is using NLP in cases like its Smart Audio Robot, which helped to contain the spread of Covid-19 in Wuhan, China, the epicenter of the original outbreak. The robot completed over a million screenings to thousands of households in Wuhan, and identified thousands of cases for tracking by local officials.

Omni-Sinitic is comprised of a pre-trained language model called ALBERT, a “lite” version of BERT; an internally-adaptive filter that augments data in the model; and neural-architecture search, which is used for designing neural networks.

Lazard Asset Management

Lazard developed a Covid-19 data model to more accurately reflect the health of corporates throughout the virus outbreak. As part of this model, the firm uses NLP and sentiment analysis from alt data sources, such as online content, news, and transcripts, to complement its Covid-19 investment strategy.

The firm has developed themed asset categories, and uses NLP and network theory to identify new associations between companies, looking at areas such as their supply chains, location, or mentions in the same press article.

Morgan Stanley

AlphaWise, a research unit within the bank that services hedge funds managers through an online web portal, leveraged alternative data to track the outbreak of Covid-19. The research business uses three core means of offering insights on the coronavirus outbreak as it developed in China, including web research, visualization tools, and market research.

The web research method uses a multinational corporation sentiment index, which can aggregate and analyze sentiment from company documents and investor-meeting transcripts using machine learning and NLP. The bank used this approach in the early days of the pandemic to gain a better understanding of how corporates were responding. In early January, the research unit was able to extract data from 2019 fourth-quarter earnings to determine how foreign companies were approaching the crisis and handling their operations in China, and to learn from uncertainties associated with the crisis as it was worsening.

“We were able to get all of these very precious data points—[at a time] when you would not be able to get much corporate or economic-related information during the Chinese New Year holiday period—overlayed with a nationwide quarantine,” said Laura Wang, equity strategist at Morgan Stanley.

S&P Global Market Intelligence

S&P Global released Machine Readable Filings, the fourth product within its Textual Data suite. The tool cleans and parses regulatory filings to generate machine-readable text, on which users can then apply NLP to look for directional indicators of how companies are faring amid the Covid-19 outbreak.

Machine Readable Filings covers about 35,000 companies and was trained using filings dating back to 2006. The product uses the concept of topic modeling, which goes further than looking for whether or not keywords, such as “coronavirus,” appear in a text. It first identifies which SEC-mandated sections the word appears in—for example, in a 10-k filing, it could be in the business overview, risk factors, or management’s discussion and analysis—and then analyzes the N-grams, or the words that follow and precede the keyword.

“It’s one thing for an executive to mention growth margin or net profitability; it’s another thing for them to mention net profit is going up or net profit is declining,” said Kevin Zacharuk, senior product manager for Textual Data.

Manulife

Creating investment portfolios for consumers who are interested in environmental, social, and governance (ESG) factors has become a hot subject in recent years. In many cases, firms are using NLP to parse unstructured data for more insight into the ESG performance of the companies that comprise a portfolio.

Insurer Manulife spent two years fine-tuning its ESG methodology for investing in equities and fixed income, as it tried to integrate ESG principles into its entire investment process, from how it sources ESG data, to how it incorporates it into its valuation models, to securities selection, portfolio construction, and risk management.

One area that the firm is exploring is using NLP for analytics that allow it to quickly find ESG news and get a sense of its implications. The firm’s ESG teams use AlphaSense, a provider of market intelligence and NLP solutions, to round out or map its ESG data, complementing the firm’s existing datasets by pulling in textual information from traditional and alternative sources to create a coherent ESG signal.

MSCI

Market index and data provider MSCI is using NLP to measure company exposure to innovation for passive and active management, evaluating and mapping portfolio exposures to make conclusions.

NLP is used to screen company information, including business descriptions and standard industrial classification of economic activities (SIC) codes, to identify keywords related to innovation. For example, the AI-based engine will scan words related to research and development activities or projects tied to improving efficiencies, which are then cross-checked with company filings to see how much of those innovative investments are linked to revenues and performances.

Hitendra Varsani, a quantitative investment strategist at MSCI, said that although it is difficult to measure the concept of innovation and translate that into data factors, the index firm has been able to use these alternative methods to evaluate and map portfolio exposures.

“We can quantify that number on home ground, we can quantify it and then normalize it versus the rest of the [investment] universe and say, for example, ‘Tesla is more innovative than say BMW’. Once we can do that, we can measure the new factor exposures, and then we are in home territory and can measure the performance attribution to innovation,” Varsani said.

DNB Bank

This year, DNB, Norway’s largest financial services group, began the final stage of its three-year big data and data science initiative. The strategy, which included the launch of its Big Data repository deployed on Amazon Web Service’s cloud last year, has since allowed the firm to leverage new technologies, including NLP, to drive better business practices, data governance, and customer insights.

The bank has developed its NLP capabilities through a partnership with the Norwegian University of Science and Technology’s Ph.D. program. One model the groups have developed interprets documents, understands what department to send them to for processing, and then automates that process.

Numerix

Firms are also using NLP to navigate the Libor transition. Risk technology provider Numerix teamed up with NextGen Strategic Advisors to introduce a new module, called Oneview for the Libor Transition. The module aims to help firms (mainly those outside of the top 10 largest banks) overcome the legal, operational, technological, and risk challenges associated with Libor’s discontinuation.

This is a massive technological haul, said Steven O’Hanlon, Numerix’s CEO. First, a bank will have to locate these contracts, as many of them are not digitized and are off-premises, he said. After that, the lawyers come in to review all those legal documents, find all the language pertaining to Libor, and then set forth addendums for replacement language around the alternative reference rates.

The Libor Transition module, which was written in Python, uses an open-sourced NLP tool to read documents and pluck out the necessary legal terms, and Google’s open-sourced TensorFlow for the machine-learning component to provide a modeled curve structure and volatility surface—volatilities used to price trade instruments—as well as to flow documents either into a bank’s internal systems or into Numerix’s Oneview platform.

Xceptor

Data ingestion and process automation provider Xceptor is building out the NLP and machine learning functionalities of its no-code platform, in which the end user performs the configuration, rather than a data engineer. 

Xceptor can consume simple, structured and unstructured data. As more firms begin to automate an increasing amount of unstructured data, Xceptor uses NLP to read and route email instructions from clients to identify clauses in complex contracts. 

Deutsche Bank recently onboarded Xceptor’s platform to automate its core operational processes in Indonesia. Xceptor will help the bank automate reconciliations with multiple external parties for its securities services business.

IPC & GreenKey

The two companies, which specialize in communications technology, extended their partnership on Blotter, a data visualization and front-end dashboard that allows traders to view and analyze voice-trade information. This latest advancement will leverage GreenKey’s NLP engine to convert voice quotes into a structured data feed and allow IPC’s customers to choose their own transcription services.

“With regards to AI, what we are looking to do is extend our portfolio of partners and leverage their technology capabilities, across both our voice and data network,” said Rob Coole, vice president of cloud technologies at IPC.

Through GreenKey’s NLP engine, Blotter already has an instant transcription service that removes the need for manual inputs, thus saving time and reducing human error. What IPC wants to do next is open the platform up to other transcription-services providers after containerizing Blotter on the OpenFin platform.

BNP Paribas

The French bank developed a model to find sentiment indicators in news reports to forecast company returns. The bank said it is not yet a major user of NLP, but wanted to leverage two powerful trends: the widespread availability of—and easy access to—unstructured text data, such as news reports; and major advances in NLP.

The current focus of the project, which BNP began working on in early 2018, is on finding sentiment signals for equities, but it planned to look at corporate bonds in the future, also.

HSBC

HSBC Securities Services sought to improve the customer experience with its chatbot. Through the messaging platform Symphony, the firm has been able to facilitate a chatbot-to-chatbot interaction that is handling customer problems independently of humans.

The chatbot has been available externally since 2019, and Stephen Bayly, global head of securities services technology at HSBC, said he has seen a dramatic difference in the number of phone-related queries since the firm implemented the chatbots.

“In the first month of activity, we had a reduction of 27% in the number of phone calls and manually-answered queries we were receiving, and that’s just continued,” he said.

Users type questions into the chat box and the bot then automatically interprets the meaning using NLP, and then sends the query to the appropriate HSBC system via an API, with users getting a response back almost instantly. This routine problem-solving has in some cases turned into interaction with customers’ own chatbots, meaning that issues are resolved without any human interaction, and often faster because they are fully automatic.

When the pandemic hit and remote working became the norm for the vast majority of financial services, HSBC updated the bot to become a virtual assistant able to take action on use queries, such as ordering new hardware, deploying software, or setting up Zoom meetings. 

IHS Markit

The company added unstructured data, in the form of research articles and papers, to its proprietary Data Lake. By the end of Q4, it had aimed to upload about a million documents published by internal analysts over the past 10 years that cover multiple industries and sectors. The documents were to be summarized and tagged so that users can understand their gist, and search for articles and reports by topic. IHS Markit used various machine learning and NLP techniques for the tagging system, including BERT.

Financial Conduct Authority (FCA)

The FCA is using NLP and machine learning as part of a broader strategy of utilizing data more effectively, according to Steven Green, who is head of Central Data Services, part of the Innovation Division in Strategy and Competition at the regulator.

Green said the FCA is looking at NLP for analyzing business plans and other documents to understand where it should be looking to police the markets.

“We are looking at combinations of new datasets to spot outlier firms, to look at patterns of firm’s behavior, to look at the way the data reflects those firms that act differently to others, to see what’s going on there, and maybe that allows us to focus our efforts when we have such a broad suite of firms we are looking after,” Green said.

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: http://subscriptions.waterstechnology.com/subscribe

You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.

Where have all the exchange platform providers gone?

The IMD Wrap: Running an exchange is a profitable business. The margins on market data sales alone can be staggering. And since every exchange needs a reliable and efficient exchange technology stack, Max asks why more vendors aren’t diving into this space.

Most read articles loading...

You need to sign in to use this feature. If you don’t have a WatersTechnology account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here