Natural Language Generation: Great Promise, Significant Barriers

- WatersTechnology Staff
- 03 Dec 2020

Tweet
Facebook
LinkedIn
Save this article
Send to
Print this page

Need to know

NLG is starting to make inroads in financial services, in uses such as summarizing data and chatbots.
Some firms believe NLG will become a game-changer in how data is consumed, while others are more reserved in their predictions of what the technology can achieve.
An opposing argument is that extractive summarization techniques are superior to the NLG’s abstractive approach.
One major challenge is turning domain expertise into machine logic.
While there have been significant breakthroughs in NLG, many firms warn of the risks of using language prediction models, like GPT-3, in financial services applications.

If only data could talk—well actually, it can.

Over the last decade, thanks to the proliferation of open-source tools, improved computing power, further cloud adoption, and major advancements in the retail space, natural-language processing (NLP) is becoming democratized, and capital markets firms are taking notice. The next evolution of NLP, though, is natural-language generation (NLG). While the technology has been around for decades, it is only in the last two years that it has begun to make meaningful inroads in financial services.

During a fireside chat at this year’s WatersTechnology Innovation Exchange, Kim Prado was asked to name one trend that will bring the most value to her business in the next five years. The global head of client, banking, and digital technology for Royal Bank of Canada’s Capital Markets group, responded, “Definitely taking natural-language generation to the next level.”

While there is a lot of excitement around this improving technology, as Prado notes, it needs to get to “the next level” if it is going to live up to its promise and help traders and portfolio managers find valuable information on which to execute.

Those singing NLG’s praises say it will become a game-changer for those trying to make sense of copious amounts of data flashing across their screens, but others are more reserved in their predictions of what it can achieve. Tim Nugent, senior research scientist at Refinitiv, who specializes in machine learning, natural-language processing, blockchain, and cryptocurrency research, says that through internal research and testing, Refinitiv had found that extractive techniques—which pick out the most salient words or phrases directly from the original text to create a summary—to be more effective than NLG at summarizing text.

“By coming up with an NLP-based scoring function at the sentence level, we think we can create better summaries, [rather than] by applying some type of NLG approach,” Nugent tells WatersTechnology. “And I think when we benchmark our methods and score them, at the moment, it’s obvious to us that extractive approaches are superior.”

Adding to the attention around NLG are the significant breakthroughs that have happened in the last couple of years—most notably with the advancements in pre-trained language models such as the latest release of Generative Pre-trained Transformer 3 in June (see GPT-3 Box at the bottom of the page). Paul Tepper, an executive director in Morgan Stanley’s technology division, says that while these advancements are impressive, heavily regulated financial institutions need to be mindful of the risks associated with advanced pre-trained language models.

“I would be pretty shocked if any large company let one of these things loose on their customers, or in any other specific case, because you really don’t know what it’s going to do, like it is a probabilistic model. And it is basically just generating the next word based on things it has seen already, so it is quite unpredictable. We are not using anything like that,” Tepper says.

‘Low-Hanging Fruit’

From 2008 to 2013, Vicky Sanders was the global and European head of equity sales at Goldman Sachs. Back then, she arrived in the office at around 6 am, spending the bulk of her mornings drafting summaries based on the latest events impacting the equities portfolios she covered. Those summaries were sent out to client analysts’ and portfolio managers’ inboxes ahead of their workday. Today, Sanders sits on the other side of the tracks as global head of investment analytics at Liquidnet, where her team is using NLG to automate this cumbersome task.

Liquidnet uses NLG to automatically create customized email alerts for a specific portfolio, which are then sent to portfolio managers and traders throughout the day. The NLG capability sits within Liquidnet’s Investment Analytics suite, which was created after the acquisitions of RSRCHXchange (of which Sanders was co-CEO), Prattle, and buy-side analytics platform provider OTAS Technologies. Leveraging portfolio managers’ watchlist, Liquidnet’s system can alert users to relevant research reports from across its research library which includes more than 430 research providers. NLP is initially used to comb through a further library of content including bank records and public information, such as earning calls and press releases. Once the most important information is extracted, the NLG then kicks in and converts the data into email alerts written in natural language.

Sanders says it is these types of repetitive, low-skilled, and time-consuming tasks that are ripe for NLG.

“There is still a huge amount of low-hanging fruit in our industry to use things like NLG to solve for—what can only otherwise be described as manual labor,” she says. She adds that NLG is best placed to handle simple tasks that will help free up teams on both sides of the Street to do more with less.

Our fundamental finding was that it’s easier for us to do the work to convert the content to NLG text than it is for every single one of our users to interpret charts, tables of data, and numbers in the heat of battle, so to speak. So, it is just simply a more efficient way of communicating content to humans.
Tom Doris, Liquidnet

Tom Doris, chief data scientist at Liquidnet—who was formerly CEO of OTAS—points to another example of how NLG is automating tasks. Historically, traders or portfolio managers would have a quant sitting by their desk crunching numbers and trying to surface actionable insights. He says “the human in the loop”—meaning the quant, in this case—was there to simplify quantitative analysis and make it easier for the front office to understand. Today, many traders and portfolio managers now have automated analytics charts and alert systems lighting up their screens.

However, Doris says that today, NLG is helping to turn those colorful analytics charts into automatically generated, readable information. For example, if the spread or liquidity profile of a stock changed significantly intraday or a company’s stock price drastically moved compared to its competitors, he says these types of market shifts can be summarized in one or two sentences—meaning there is less “subjectivity” or “mental workload” for the trader to cope with.

“Our fundamental finding was that it’s easier for us to do the work to convert the content to NLG text than it is for every single one of our users to interpret charts, tables of data, and numbers in the heat of battle, so to speak. So, it is just simply a more efficient way of communicating content to humans,” Doris says.

Liquidnet’s NLG technology is rules-based and includes a finite list of sentences and paragraph templates for the system to choose from when converting data into natural language. While the technology model can be trained to turn structured data and routine events (i.e. price shifts) into bite-sized summaries, Sanders says more complex readings of data should be left up to the human.

“When it comes to interpreting a [complex] chart, that’s probably where you’d want to lean more on the human intelligence and the artificial intelligence (AI). … I think for our industry and the use cases we’re looking at, most of it is still in the low-hanging fruit area, as opposed to further up the value curve or intelligence curve,” she says.

A Character Issue

Ted Merz, global head of news product at Bloomberg, describes NLG as a proxy—in other words, a single cog in a complex engine that turns data into human language. He says the data and media giant’s readership of NLG-generated news articles, from multiple media outlets and accessed through its Terminal, has gone from zero to 7%, in the last two to three years.

Bloomberg is using NLG to produce four types of automated news articles: stories compiled from alternative data sources, corporate filings (i.e., earnings calls), combined data sets (a mixture of traditional and alternative data) and market anomalies.

Taking the first group of articles, these are generated from “obscure” alt data sets, as Merz calls them, such as the number of people riding the subway, the number of people working in an office, or the number of restaurant bookings via applications such as OpenTable. Merz says that because portfolio managers are less familiar with these types of data sets, and by turning them into articles it makes them easier to understand.

“If you were only looking at the dataset from OpenTable about restaurant bookings, it would be hard to understand what was happening. So, what we do is write scripts [or articles] that say, ‘The number of restaurant bookings are up or down versus comparable periods; these are the areas and places where they’re increasing or decreasing the most,’ and it really makes that data much faster and easier to absorb’,” he adds.

Like most NLG systems, Bloomberg’s is based on a set of rules. Within that, it also has thresholds for when an automated article should be published. For instance, take the third group of news stories, generated from combined data sets. In the event of a hurricane, the tool would pull in data sets such as weather information and geolocation data to predict what company assets could be at risk of being impacted. As an example, Merz says if the hurricane were to hit 25% of Exxon’s oil facilities, the risk percentage would be enough to trigger the system to publish the article. The logic behind this is to avoid spamming traders or portfolio managers with meaningless information.

When building an NLG system for summarization, one of the hardest things to get right is data quality. Merz says when building its summarization tool—which summarizes multiple news articles—a big decision the development team had to wrestle with was the maximum length for the summary. Although this sounds like a trivial challenge, he says the length of the characters can drastically change the way the NLG generates a summary.

The team found that if the length was too short, it did not make sense; if it was too long, it was no longer that useful as a summary.

“If you said the length was 10 characters, 30 characters, or 60 characters, you would get a different summarization every time. It is not like you just add characters; the computer recalculates the summary completely differently,” he says.

A Matter of Understanding

Tepper has over 20 years’ experience working with NLG and has written his undergraduate and master’s theses on the technology. Today, he sits as an executive director in Morgan Stanley’s technology division, where he focuses on AI and NLP for wealth management applications.

While most firms exploring NLG have been using it to summarize data, Morgan Stanley has taken a slightly different route by embedding it in its chatbot for assisting its financial advisors. The bank has an internal contact center that fields calls to its financial advisors and their support staff, but the hope is that with the development of a chatbot assistant, the NLG feature could alleviate some of the workload before reaching the advisor. The advisory chatbot is separate from Morgan Stanley’s AskResearch bot that was built to help bank analysts and sales teams query thousands of reports generated each year.

The challenges are getting that knowledge out of people’s heads, making sure it’s correct [and] precise, and that the language being generated sounds natural.
Paul Tepper, Morgan Stanley

“The call center internally fields, like, millions of calls a year—not billions—but millions of calls a year. So, it’s a significant cost and if we can divert some of the cost by answering the question with this self-service system, or we can sort of reduce the amount of time they spend looking for these answers by getting them part of the way there without having to talk to a person, we can both reduce the cost of running these systems, as well as potentially provide a better experience for our users,” Tepper says.

Tepper says the bank is about six months into building the advisory chatbot and has trained it with several hundred intents—meaning the bot should be equipped to answer hundreds of questions. The bank also has plans to extend the NLG function to client-facing applications like Morgan Stanley Online, its web-based and mobile-app interface.

Tepper says the most challenging part of building NLG-based tools—or even NLP, for that matter—is developing the ontology, otherwise known as a knowledge graph. The ontology organizes all of the information that the chatbot would leverage to understand the intent of a user’s query—including documents, legal entities, types of accounts, or types of businesses. Once the machine understands the user’s intent, it can prompt a dialogue.

To illustrate this further, Tepper says if an analyst typed “individual retirement account” into its chatbot, the chatbot would be prompted to ask a variety of questions, such as, ‘Do you want to open an IRA account?’, or, ‘Do you want to close an IRA account?’, and so on. Highly skilled knowledge graph engineers—who are in significant demand these days from Big Tech and financial firms—are responsible for building these complex ontologies and transferring domain expertize into knowledge representations.

“The challenges are getting that knowledge out of people’s heads, making sure it’s correct and precise, and that the language being generated sounds natural,” says Tepper.

Tepper says Morgan Stanley’s NLP and its subset natural language understanding (NLU) applications—which focuses solely on the machine’s ability to read and comprehend—incorporate machine-learning techniques. Yet the NLG is based on traditional symbolic rules—and for good reason, Tepper says.

“The trouble with the NLG being machine learning-driven is that if it’s actually generating what to say, then you’re going to be trading off some of the control—you won’t know necessarily what it’s going to say. So you don’t necessarily want to be in a situation where you’re generating content that somebody is going to read and then go forward with, [but] you don’t know what it is potentially going to put together [and] what it’s going to say. So, you have to be kind of restricted on how you can use machine learning there,” Tepper says.

Abstractive vs Extractive

Refinitiv’s Tim Nugent splits data summarization into two categories: abstractive and extractive. NLG falls into the abstractive category, meaning it generates an abstract summary based on a fixed language model—or a finite number of sentence structures—and the data it is tasked with summarizing is used to fill in the gaps. Take this sentence for example: The S&P rises by 1% today. The words “rises” and “1%” could change depending on whether S&P rose or dipped that day; however, the rest of the sentence could be fixed. This is a basic example of how NLG generates a sentence.

In the extractive group, NLP is used to pull lines directly from an article to form a summary. Nugent says Refinitiv has experienced more success in using this approach than NLG.

“Without any fancy bells or whistles, a very strong baseline approach to summarize an article is to actually take the first three sentences, and this is because journalistic style tends to dictate that you put the most important aspects of the entire article up front,” he says.

His view is that summaries produced by NLG are more likely to omit important bits of information in the article or document than using an extractive approach. He says factors like summary length and the length of the inputted documents are contributing factors in overwhelming the NLG, and which effects the quality of its output.

“If you were to drop the entire earnings call transcript into an NLG approach, you’ve got so much information, you’ve got so much text from which to source your abstractive summary from, you are much less likely to capture the critical sentences than, for example, if you apply this extractive approach, so the length of the input document actually has a fundamental impact on the quality of the output depending on whether you choose an NLG abstractive approach or an extractive approach,” he says.

Story Time

RBC’s Kim Prado, on the other hand, sees NLG as a tool for telling a story about the bank’s data. RBC opted to use NLG over other summarization techniques following multiple proofs of concept over the last several years, she says, where the team has become comfortable using the technology.

“Our focus was on reducing the data ‘firehose’ effect, and providing key actionable insights from data, and in this case, NLG was able to solve for that business problem with an impact,” she says.

RBC is now building a summarization engine for turning unstructured client-interactions data into human-readable summaries. For the technology to work, the firm’s NLP first identifies important client-interactions data—such as product mentions or followup activities—from sources like Salesforce, its client relationship management system, and proprietary and vendor chat applications.

The NLG then converts the interactions data into natural language summaries. Those summaries are then pushed out to users of the data, in this case the sales team, traders, product owners, and even senior executives.

“We are hoping that by providing a clean and relevant summary, it gives our users the incentive to better use our systems, and eases their minds in needing to browse through pages of reports in order to make sense of what is happening,” Prado says.

For building its NLP, the bank has used Google’s Bidirectional Encoder Representations from Transformers (BERT), an open-source transformer-based language model, as well as model embedding, sequence-to-sequence transformer models, and classical machine-learning techniques. The bank has no plans to train its NLG using BERT or other advanced encoder-decoder models, says Prado, but the team is open to experimenting with new ideas as they mature.

Regardless of the camp you’re in when it comes to NLG—whether you think it will revolutionize dashboards or play a small role in how portfolio managers consume information or how chatbots operate—it is clear that NLG is making inroads in financial services.

The way traders and portfolio managers consume data and the battle over desktop space has been a major focus end-users and vendors alike since the advent of computers on Wall Street. The big test for NLG will be whether the front office finds an appetite for natural-language summaries on their screens, or opt for flashing visuals and NLP-powered analytics.

GPT-3: Breakthrough or Calamity?

One of the recent developments in the world of NLP and NLG that has brought about a mixture of excitement, buzz, and concern is Generative Pre-trained Transformer 3, or GPT-3.

The autoregressive language model uses deep learning to produce natural language text. It was created by OpenAI, a research business co-founded by Elon Musk and is pre-trained on a vast corpus of data—about 175 billion parameters.

While the technology can be applied to multiple tasks—such as generating summaries or even writing code—some are skeptical of applying it to real-world applications, particularly in the heavily regulated financial services industry.

“We just absolutely refuse to even run that risk,” says Tom Doris of Liquidnet, in talking about GPT-3 and whether the firm uses it in its own NLG summarization tools. Rather, the firm’s NLG is built on a rules-based engine and proprietary ontology. “The solution that we’ve come up with works very well for us, without needing to train on huge corpuses of un-curated content.”

Tim Nugent of Refinitiv is also a self-proclaimed skeptic of GPT-3. He agrees with the view that it still is not well understood how the technology works. To illustrate this, he uses an example of a GPT-3 model being tasked with summarizing an article in 50 words. He says thatj although the model may use the article to inspire the summary, it also draws on an unknown corpus of pre-training data, thus making it difficult to predict what information it could spit out.

“You actually have much less control over what GPT-3 generates than you might think, and that absence of control should absolutely be concerning for customer-facing output,” Nugent adds.

Nugent believes that there are fewer safeguards when using GPT-3 and protecting clients from noisy data, potentially inappropriate language, and just data of a poor quality.

Paul Tepper of Morgan Stanley says pre-trained neural networks that underpin GPT-3 are impressive—but echoes similar concerns about exposing clients directly to the technology. He says he is doubtful that any major organization would have “the appetite for unleashing unsupervised learning on its customers.”

To drive home this point, he says institutions are heavily regulated entities and are limited as to what they can do with machine learning or AI. There are global guidelines, such as the European Parliament’s guidelines on ethics, which require financial services firms to explain how their algorithms and AI work.

Kim Prado of RBC says the bank has not yet explored GPT-3 in its NLG. “We have not tried out GPT-3, but we are actively watching the adoption in other areas. GPT-3 is closed source and we are yet to onboard it,” she says.

Natural Language Generation: Great Promise, Significant Barriers

Sell-side firms and data providers are increasingly experimenting with natural-language generation to create new forms of automatically curated reports, emails and alerts, but the technique comes with significant challenges.

Need to know

The challenges are getting that knowledge out of people’s heads, making sure it’s correct [and] precise, and that the language being generated sounds natural.

GPT-3: Breakthrough or Calamity?

Further reading

More on Data Management

New working group to create open framework for managing rising market data costs

Off-channel messaging (and regulators) still a massive headache for banks

Back to basics: Data management woes continue for the buy side

‘Feature, not a bug’: Bloomberg makes the case for Figi

SS&C builds data mesh to unite acquired platforms

Aussie asset managers struggle to meet ‘bank-like’ collateral, margin obligations

Where have all the exchange platform providers gone?

Reading the bones: Citi, BNY, Morgan Stanley invest in AI, alt data, & private markets

You are currently on corporate access.