Ping An Asset Management zooms in on NLP models for sentiment analysis
The asset management arm of Ping An Insurance (Group) Company of China is enhancing its NLP models to solve complex, non-linear challenges such as overfitting.
Ping An of China Asset Management (PAAMC) in Hong Kong—the asset management arm and a wholly-owned subsidiary of China’s largest insurer, Ping An Insurance (Group) Company of China—is looking to upgrade its natural language processing (NLP) models, particularly to account for Chinese sentiment analysis.
Chi Kit Chai, head of capital markets and chief investment officer at PAAMC, tells WatersTechnology that vectorization of different words is the key to the success of any NLP algorithm. Vectorization is a methodology in NLP that maps words or phrases to a corresponding vector of numbers to find word predictions, similarities, and semantics.
He says that it can be harder to break down Chinese words in a meaningful way, but PAAMC aims to do so with contextual information and sentiment analysis.
Breaking down Chinese characters is something Boston-headquartered PanAgora Asset Management has also dealt with. The asset manager developed its own machine-learning models to track chat and blog conversations in Chinese to determine market sentiment.
In 2019, Mike Chen, director of equity and head of sustainable investments at PanAgora Asset Management, told WatersTechnology that its solution relies on an entire corpus used to train the NLP model of languages to track conversations. The challenge is less in the use of different languages and more in the use of slang or other words in markets-related conversations.
PanAgora deals with slang words in the Chinese internet community by waiting for them to gain prominence before it updates its NLP library.
“The library is the natural-language processing model. It just keeps on updating. When a new cyber slang word gains prominence, if the algorithm sees it sufficiently enough times, it will pick up on it. It’s fully automated and self-updating,” he said.
The work Chai and his team at PAAMC are doing to better analyze Chinese sentiment analysis will go into the firm’s overall machine-learning framework. The framework combines deep-learning neural networks, gradient boosting machines, and advanced regression models.
Chai says in artificial intelligence terms, these combinations are called “ensemble methods.” The framework contains non-linear models, which help PAAMC capture factor interactions and non-linear patterns hidden in alpha signals. On top of that, it provides low correlations among multiple models that can further increase Sharpe and information ratios—measurements that help investors determine the risk-adjusted returns of a security or portfolio.
“Our framework merges the generation of alphas and alpha-weighting algorithms using machine learning techniques. For factor interaction and non-linearity, an example is the leverage of a company. Linear relations can only assume a company’s performance is proportional to its debt ratios. As a matter of fact, debt ratios bear non-linear patterns with a company’s performance,” he says.
It uses historical structured and unstructured data—including news, price movement information, macroeconomic inputs, and company-specific accounting information—to train its AI algorithms. It monitors more than 300 factors and selects between 20 and 50 factors to construct its portfolios every month.
According to Chris Vera, associate director at asset and wealth management consulting firm Shoreline, using non-linear models means there is a greater ability to incorporate multiple variables to draw conclusions. A linear model would incorporate perhaps two or three inputs to get an output. “Something like, based on these two things, the stock will go up, risk will go down, for example,” he says.
In contrast, non-linear models can be used to describe text. “When you and I talk—the sentences we send to each other—we need to put long non-linear formulas to describe [the conversation] because we can use lots of different words and we can construct sentences in different ways. It’s more complicated than sending each other numbers because words are quite difficult to describe. There’s context, there’s language, there’s tone, there’s volume, there’s dialect,” he says.
Building on knowledge
The asset manager, which manages over $440 billion of assets, is able to leverage technologies and, perhaps more importantly, ideas from all the other units that sit under its parent company. Ping An Group has three main business segments—insurance, banking, and investment—all of which are supported by its technology arm.
While the asset management business benefits from applying technology that has already been developed in other areas of the group, Chai says different problems require varied solutions from multiple application domains.
For example, Omni-Sinitic—the machine-learning framework that the group developed, which has in the past bested companies like Microsoft, Google, Alibaba, Huawei, and Facebook in the General Language Understanding Evaluation (Glue) benchmark that is used to evaluate natural-language understanding systems—is useful for NLP problem-solving.
For PAAMC, the focus is different. “Here we focus on solving problems in finance and investment. We deal with NLP in our machine-learning framework from a different perspective as we face different challenges. We also put a lot of emphasis on dealing with overfitting when we process very noisy data to extract high-confidence alpha signals,” he says.
Overfitting occurs when a model learns the detail and noise in the training data, to the point that it affects the model’s performance on new data.
As Shoreline’s Vera puts it, overfitting is a stumbling block that happens in data science “when you go from walking to running, and then you trip, scrape your knee, figure out what you did wrong, and then you start to run again.”
Overfitting tends to happen when data scientists throw too much compute power at a model, he adds. “This is where you need to take a step back because you can’t just train a lot of data, come up with something that is like a complicated jigsaw puzzle piece and assume that jigsaw puzzle piece can be used for other hypotheses. That’s overfitting,” Vera says.
According to Vera, PAAMC seems to have achieved “machine-learning sophistication” ahead of other asset management firms. “They’re well beyond the use-cases of forecasting liquidity, forecasting changes in risk, formulating portfolio construction—that’s all linear. When you’re dealing with overfitting, you’ve moved on to non-linear, and non-linear problems are a lot more data-hungry; they’re a lot harder to explain. If you’ve gotten to the point of overfitting non-linear, then you’ve been on non-linear for a good amount of time,” Vera says.
This could be due to how PAAMC leverages its parent company’s technology expertise.
PAAMC’s NLP models use news data and corresponding sentiment scores to rank different stocks. As for the overfitting challenge, Chai says machine learning has different solutions to handle overfitting, including cross validation and regularization. “We also use cross-market validations,” he says.
In terms of alternative datasets, PAAMC uses Chinese texts from different media, which Chai says provide signals that are robust and that have low correlations to other alpha streams it has. These streams include fundamental, macro, and price data.
Chai says the robustness of the NLP signals depends heavily on the robustness and sophistication of the models. “It is something we spend a lot of time on to differentiate ourselves from others,” he adds.
Further reading
Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.
To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: http://subscriptions.waterstechnology.com/subscribe
You are currently unable to print this content. Please contact info@waterstechnology.com to find out more.
You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.
Copyright Infopro Digital Limited. All rights reserved.
As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (point 2.4), printing is limited to a single copy.
If you would like to purchase additional rights please email info@waterstechnology.com
Copyright Infopro Digital Limited. All rights reserved.
You may share this content using our article tools. As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (clause 2.4), an Authorised User may only make one copy of the materials for their own personal use. You must also comply with the restrictions in clause 2.5.
If you would like to purchase additional rights please email info@waterstechnology.com
More on Emerging Technologies
This Week: Startup Skyfire launches payment network for AI agents; State Street; SteelEye and more
A summary of the latest financial technology news.
Waters Wavelength Podcast: Standard Chartered’s Brian O’Neill
Brian O’Neill from Standard Chartered joins the podcast to discuss cloud strategy, costs, and resiliency.
SS&C builds data mesh to unite acquired platforms
The vendor is using GenAI and APIs as part of the ongoing project.
Chevron’s absence leaves questions for elusive AI regulation in US
The US Supreme Court’s decision to overturn the Chevron deference presents unique considerations for potential AI rules.
Reading the bones: Citi, BNY, Morgan Stanley invest in AI, alt data, & private markets
Investment arms at large US banks are taken with emerging technologies such as generative AI, alternative and unstructured data, and private markets as they look to partner with, acquire, and invest in leading startups.
Startup helps buy-side firms retain ‘control’ over analytics
ExeQution Analytics provides a structured and flexible analytics framework based on the q programming language that can be integrated with kdb+ platforms.
The IMD Wrap: With Bloomberg’s headset app, you’ll never look at data the same way again
Max recently wrote about new developments being added to Bloomberg Pro for Vision. Today he gives a more personal perspective on the new technology.
LSEG unveils Workspace Teams, other products of Microsoft deal
The exchange revealed new developments in the ongoing Workspace/Teams collaboration as it works with Big Tech to improve trader workflows.