More banks flirt with machine learning for CCAR—but risks persist

The superior computational grunt of neural networks is attractive to lenders, but a lack of explainability presents a significant downside.

Machine learning techniques are taking hold in US banks’ stress-testing models, bit by bit and byte by byte. Proponents trumpet their ability to calculate revenue and loan-loss forecasts faster than existing methods. But users are running up against a familiar barrier: the difficulty of explaining the complex practices to model validators and regulators.

One large US bank is developing a prototype model for its annual Comprehensive Capital Analysis and Review (CCAR) as well as for the Current Expected Credit Loss (CECL) accounting standard, both of which require forecasting losses based on macroeconomic scenarios.

The model will use machine learning to link economic variables with actual loss forecasts. The bank hopes this will generate new insights that are manually intensive using traditional modeling techniques.

“Machine learning in theory would do that quicker. Right now, it’s all about testing a few things at a time in different combinations, but you can’t throw everything at a logistic regression equation,” says a finance executive at the bank.

A US global systemically important bank (G-Sib) has been applying neural networks as a challenger model for its primary credit risk models in coming up with its CCAR forecasts. The machine learning model is capable of analyzing non-linearities—changes in importance of variables—in data much faster than a traditional model.

In retail credit, for example, during the first six months of the 39-month CCAR horizon, variables such as income and delinquency history are the primary determinants of default. Further out along the horizon, the importance of such variables decreases, and macroeconomic variables become the primary determinants.

Using traditional methods, it would take data scientists months to instruct the model to capture non-linearities. A neural network can do it in minutes.

“It’s able to learn automatically to break the data into different segments. That’s how we take advantage of this,” says a senior modeling executive at the US G-Sib.

Because machine learning models can analyze enormous amounts of data, they become valuable for testing ‘extreme but plausible’ stress scenarios that have a low probability but could result in big losses. This is especially true for non-traditional data such as information captured from social media. 

CCAR is looking at behavior under extreme situations. Traditional models don’t perform well at the extremes. They tend to extrapolate from past data. Machine learning allows for more flexibility and allows other types of data that could complement existing models,” says Evan Sekeris, a former stress-testing official at the US Federal Reserve.

A third US bank says it is looking into a form of machine learning called gradient boosting to predict loss given default for its CCAR models. The model is being considered as a challenger model for its primary ratings migration model that it uses for stress-testing. The model validation unit at the bank has had lingering concerns about the primary model because it’s not granular enough—that is, it only analyses losses down to segment level, rather than loan level.

“We thought maybe we could get better performance using gradient boosting. It’s an off-the-shelf Python package, and it has built-in explainability,” says a model risk executive at the bank.

How does it work?

But if machine learning promises additional speed and accuracy, its drawbacks are added complexity and cost. While techniques such as Shap and Lime are useful for explaining the workings of the more basic machine learning models, it becomes much more difficult to explain deep learning models such as neural networks, whose outputs are determined by the interactions of thousands of nodes in a network, rather than linear equations. 

“When you run machine learning, you don’t know how it’s making decisions. You don’t have the equations, and it can become so complex that there’s no way to interpret it,” says Sekeris.

Recent academic research suggests interpretable neural networks can be applied to CCAR. However, there are doubts over whether the specific technique used in the research would work in practice. For example, the paper uses time series data to forecast credit card charge-offs. The executive at the US G-Sib says this data is overly simplistic because it fails to take into account the fact that the composition of the portfolio will change over the 39-month CCAR horizon. “That paper has a fundamental flaw in not having the portfolio composition reflected. No bank will be allowed to do that,” he says.

More broadly, some are skeptical whether machine learning models—even simple ones—will convince model validation experts of their suitability. Michael Jacobs, head of first-line model development validation at PNC Financial Services Group, says such models could give rise to operational risk once put into live deployment.

He adds: “In the view of the independent validation, the incremental improvement in model performance may not be warranted by the additional complexity.”

Machine learning models are also hungry, requiring lots of data to be able to work effectively. But the economic data used in CCAR is relatively small, which serves to limit the additional value that machine learning can provide. With smaller datasets the learning is only partial, and the machine learning model will keep trying to find correlations in the data without achieving any real insight into what drives those correlations.

A technique developed by the modeling team at Wells Fargo helps address the problem of explainability of certain types of neural networks. It works by breaking down neural network models into smaller models that can be more easily explained.

The ability to interpret neural networks without having to resort to post hoc explainability techniques could allow banks to deploy the models as their primary CCAR model or retain it as a challenger model as they see fit. In the latter case, the neural network is used to refine the results of the primary model.

The senior modeling executive at the US G-Sib says: “As a data scientist, I have two choices: implement the neural network as it is, or build a tactical model informed by the neural network. When you structure the network so it becomes interpretable just like a traditional statistical model, then you have high confidence to implement it as is. If not, then you just translate to a statistical model.”

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: http://subscriptions.waterstechnology.com/subscribe

You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.

Most read articles loading...

You need to sign in to use this feature. If you don’t have a WatersTechnology account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here