Point Break: How Vendors Push Their Products to the Limit

- By Max Bowie
- @MaxBowie
- 10 Dec 2020

Tweet
Facebook
LinkedIn
Save this article
Send to
Print this page

Need to know

As mission-critical elements of trading firms’ architectures adopt new technologies, such as cloud, suppliers must test their products more rigorously to minimize service issues.
Enterprise data services, in particular, must be bulletproof, as they interface with—and therefore have the potential to impact—many other systems.
Engineering groups are incorporating QA and testing functions into the development workflow, enabling testing to benefit from agile methodologies.
From “Gorilla Testing” to “Chaos Testing,” vendors are employing novel techniques to simulate random circumstances, beyond merely testing code quality.

They ride in the tour bus, but they aren’t the rock stars. Behind the scenes at any concert, an army of sound engineers and audio technicians ensure that the music reaches the audience. The capital markets also have rock stars of their own—the high-profile traders whose activities make headlines, and the technologists who build the systems that can trade off of headlines.

Capital markets firms and their technology providers take the rollout of new software as seriously as a musician takes their sound quality, employing organizations devoted to testing and re-testing code before being deployed into production, and leveraging any and all new tools and techniques to put code through its paces—even if it means breaking it and having to start again—to ensure any new software is battle-tested before being deployed to the front line.

jim-nevotti-sterling-trading-tech — Jim Nevotti

They take it seriously because of what’s at stake: money—and lots of it. “Our products are used to trade or manage risk and compliance in the financial markets, so by their nature, any releases that we do are always very high risk,” says Jim Nevotti, president of trading and risk management software vendor Sterling Trading Tech. “If we release a trading platform and get the symbology or something else wrong, or if it doesn’t support a certain order type, there are large amounts of money associated with what can go wrong.”

Errors resulting from a lack of testing can be costly—not just the risk of clients losing money if something doesn’t work correctly, but also the time and money required to investigate problems, fix software, and roll out patches, as well as any lost revenue from dissatisfied clients.

But before investing time and money in building software—let alone testing it to the breaking point—organizations should first test their hypothesis: Should they really be building this particular product or enhancement? Is it really necessary? And just because it’s a cool idea, or a few clients ask for it, will that translate into something that provides sufficient value that users will be willing to pay for it?

“First, we assertively work to validate that what we are doing has value for the customer,” says Terry Roche, co-founder and CEO of startup enterprise technology vendor Pegasus Enterprise Solutions. “We attack it from every angle, and make sure it is entirely validated with clients applying our deep, real-world experiences. A lot of organizations don’t do that.”

This is an important strategy when approaching “moonshot” projects that go beyond business-as-usual tasks and routine process automation, says Agrim Singh, hacker-in-residence at Citi Ventures D10X, a division of the bank’s venture arm tasked with identifying new technologies for use by Citi and its clients.

“Part of that is obviously you want to be doing a lot of generation of ideas and generation of the problem statements you want to tackle. But you also want to find ways of assessing risk around these [ideas and problems by] building and breaking things so that you can quickly validate or invalidate whether your assumptions are correct, your problem statement is correct, whether your solution hypothesis is correct, and there are various steps within the product line,” Singh says. “That’s part of my role as well—to not only talk to the people at the beginning to validate problems, but also build and break things to see what you receive at the end.”

Singh is a fan of moving quickly and testing concepts internally and with clients to prevent wasting time developing and testing ideas that may be flawed or unwanted, and is a proponent of hackathons that can validate ideas within a day, and getting “quick and dirty” demonstration products into clients’ hands quickly to solicit feedback.

FactSet Research Systems’ recent project to migrate its ticker plant to the cloud is an example of a project that was validated on paper, but where the vendor wasn’t entirely convinced of its viability—and whether cloud could offer the stability and low latency required by such a critical piece of infrastructure—until it had put the cloud ticker through an exhaustive series of tests and could be fully confident in its resilience, which proved especially important given this year’s volatility.

“Historically, and this year especially, we’ve seen some incredible volumes. We were able to simulate … close to 10 times the highest volumes we’ve ever seen. We could see the impact of that load, and we could see the environment being loaded differently from the average, but there was no impact on clients. You see a drop in terms of latency, but we were still achieving our service-level agreements with plenty of headroom,” says Gene Fernandez, chief product and technology officer at FactSet. “Our engineers started out completely skeptical that it could even be done. But then they started to see statistics that showed it might be possible. And then they saw statistics that we might be able to materially improve things. So they went from being skeptics to being advocates.”

The vendor spent the first half of this year planning and testing the cloud ticker, stress-testing it and deliberately trying to find ways to make it fail, to understand its limits and learn how it would react when faced with certain situations, from unexpected bursts of market volume to systems outages.

To simulate outages, FactSet employed a tactic known as “gorilla testing”—“like if a gorilla got loose in the datacenter and started ripping out cables and servers—so we can see how the system responds,” Fernandez says.

In these simulations, the vendor began by trying to imagine what scenarios might occur. Then it assigned engineers to a “red” team and a “white” team. The red team spent its time doing all it could to take the system down, while the white team observed their efforts and chronicled the impact on the system and its environment without intervening or doing anything to defend against the red team’s efforts.

Prevention Is Better than a Cure

Many of the things a firm or vendor will test for may not be major faults, but rather changes in an update or new feature that cause previous features to not work properly. The new feature may have been tested independently and found to work fine, but code does not exist in a vacuum. It has to interact with other components and work seamlessly as part of an enterprise-wide software framework.

“Environmental dependencies may not always be explicit as to how a change affects other areas,” and software bugs can be code that works as designed, but the designed usage may be “erroneous behavior deliberately introduced as a result of misunderstanding or miscommunication around functionality requirements,” says Denis Chekhlov, chair of Bloomberg’s automated testing guild. “So at each point, you need to think about what you are testing and for what purpose—the business logic itself, and the underlying integration with other systems and data.”

denis-chekhlov-bloomberg — Denis Chekhlov

Testing also can’t be viewed as a substitute for due diligence during the development process. Compromising or skipping steps during development—or not ensuring that different components that are part of an ecosystem involving multiple teams of engineers and business staff work properly together—may make it harder to test the entire code base once complete.

“With any defect, prevention is definitely better than a cure,” Chekhlov says. “You want to introduce practices that prevent regressions from sneaking into production. The goals are to spot bugs as early as possible in development, and to fix them fast—even when they are spotted late, and even when they are spotted after they are already deployed to customers.”

And because bugs may not be coding errors but may be something that arises as a result of unintended use, issues can’t always be completely prevented, adds Sterling’s Nevotti.

“There are always unintended consequences from real-world use of your software, and you can never account for everything,” he says. “You just have to find them and modify your procedures to account for them.”

But the more a company has the resources to put software through its paces internally, the less chance there is that any bugs will wind up in customers’ hands. Caleb Eplett, chief product officer at fundamental data analysis provider YCharts, describes how any new software undergoes multiple “rigorous” peer reviews and a “run book” that combines engineers being assigned specific areas of code to test, automated tests, and “user-style tests” that mimic users building a chart or portfolio, before any software enters the vendor’s staging environment and is released to an internal test group, and only then to a small group of clients.

The process is well-documented as engineers communicate what works and what doesn’t work over communications platform Slack, so that the process is transparent and easily searchable. It’s also managed from the top down by CTO Ara Anjargolian and vice president of engineering Kevin Fox, and is evaluated regularly for potential improvement.

“Especially if something slips through the cracks, we’ll do a post-mortem, find out where we failed in the testing process, and fix it so it can’t happen again,” Eplett says.

Sterling Trading Tech follows a similarly rigorous internal process of stress testing—challenging its own order throughput and performance—and regression testing by its dedicated quality assurance team, before making new software available to client service and technical teams, who might spot “real world” client issues that developers may not have thought of. After that, the vendor goes through three stages of releasing software into the world, starting with a limited rollout to early adopters or clients who had requested specific enhancements, then a soft launch, followed by a full production release. Typically, it takes between four and eight weeks for a new software release to go into production, from finishing development to general availability, Nevotti says.

Like YCharts, Sterling prefers to make small, frequent releases, rather than rolling out a bunch of updates all at once, believing this approach ensures greater platform stability because there are fewer variables that could go wrong, while smaller updates are quicker and easier to test.

“It evolved that way naturally, and we’ve refined it over the years,” Nevotti says. The company also evolved toward the agile development methodology, and now tends to work in two-week sprints. “It evolved that way because we have a strong desire to be responsive to clients. When they ask for changes, we don’t want them to have to wait six months for the next annual release, when we can schedule it for the next development cycle,” he says.

Clean Up as You Go

John Eley, CEO of data management platform vendor GoldenSource, and senior vice president of product management Tom Stock are also both supporters of the agile model, but have applied several unique twists that they believe give the vendor an edge when it comes to testing.

For example, the vendor has instituted a monthly review of its code quality, where its developers present their progress toward corporate-wide goals to Eley; it surveys clients annually on their perception of its quality; and it has multiple levels of testing during and after the development process, including a group called the “model office,” which is independent of its development group and tests software in a way it would be used by clients.

“I don’t think we’ve created any earth-shattering measure of quality, but we’ve assembled them in a way that’s unique to us,” Eley says.

The vendor has also made a big investment in its DevOps function over the past couple of years, to make sure any additions don’t interfere with existing functions,

Another area of investment over the same period has been on automated testing and tracking tools. Currently, GoldenSource has an employee writing test scenarios, but it envisages being able to automate testing scenarios in the future, which should speed up the testing process, while also allowing its testing staff to focus on more complex scenarios.

“If you can auto-generate some of the functional tests, you can focus more on business testing, and being able to test the way end users would. For example, you may find usability problems that technical test staff didn’t know to test for,” Stock says. “The ‘model office’… is not a generally accepted concept that all software companies follow—it’s something we do that we don’t believe everyone else does.”

New Approaches

Sometimes, like with Sterling, approaches to testing evolve over time. In other cases, it takes a new hire to bring a different viewpoint and approaches gained from past experience in different organizations.

In YCharts’ case, although robust practices were already in place, it was the arrival of Sean Brown as CEO that saw those practices turned into processes. “When Sean came in, he insisted that we created well-documented processes for the things we were already doing,” Eplett says.

derek-ferguson-fitch-solutions — Derek Ferguson

At Fitch Solutions, the software arm of ratings provider Fitch Group, that person is Derek Ferguson, who joined the vendor one year ago as head of technology in Fitch’s IT development solutions division. Prior to Fitch, Ferguson spent 11-and-a-half years at JP Morgan Chase, where he was most recently head of engineering for its commercial banking division, and had previously served as lead order management system developer for the bank’s private client workstation.

Over the past year, under Ferguson’s tenure, the vendor reengineered FitchConnect—its flagship subscription-based web app for accessing market research—from being a single piece of software to a set of micro-frontends—the client-facing equivalent of back-end microservices—that coexist within a browser. This has resulted in major improvements to the efficiency of Fitch’s development team, and also for customers awaiting the rollout of new features.

Previously, FitchConnect needed to be taken completely offline to roll out new software releases, which could only be done over evenings and weekends. By breaking down the service into a series of micro-frontends, specific functions can be upgraded intraday without impacting other components of the system, and the vendor need only test each new feature being introduced, rather than re-testing the entire system.

“When I came in, our core FitchConnect web application was a single piece of code, and to change one thing, you had to change all of it—and that was the main thing slowing us down,” Ferguson says. “For example, every time we released a new feature, we had to re-test everything. And although the testing scripts were good, there are about 20,000 scripts, so it would take two days to run.”

His first task was to increase the use of automated testing scripts. “When I arrived, most of the work around testing was to automate processes. Some it was already automated and was part of the build process, and some of it was manual. So to be an agile squad, automation needed to be part of the build process,” Ferguson says. Now, using micro-frontends, it only takes a developer five minutes to test an individual new component, allowing them to ship new features much faster, and resulting in a 25% increase in productivity.

Fitch also utilizes two key tools to monitor productivity—BlueOptima, which monitors how much code its developers have added each month, and JIRA, a system that tracks workloads and allows the vendor to review metrics around the productivity of its sprints. “BlueOptima can tell good code from bad code on a technical basis. JIRA tells us something equally important—we may be going fast and building good code, but is it the right code for the business?” Ferguson says.

Automation is also key to Pegasus’ plans—not only to instill trust in the quality of its products among potential clients, but also future plans to make its services more widely available. Pegasus’ attention to detail over its testing—it spends equal time writing and testing code—is because it doesn’t just plan to sell software, but also to make its source code available to clients, along with its testing capabilities, says co-founder and chief product officer Brian Stephens, who previously served as head of market access, market data, and middleware technology at Royal Bank of Scotland.

“The testing discipline stems from a deep understanding of what clients require after my long experience on the client side. When we built the first version of our APIs in Java, we spent the first few months writing tests, from unit tests that look at one component to integration tests that test certain systems and scenarios,” Stephens says. “We’ve ended up building upwards of 500 tests in each language for our APIs—and that number continues to grow.”

If It Ain’t Broke, Break It!

Without structure and automation where possible, testing can descend into chaos. But for some, chaos theory is actually a desirable testing methodology.

“Chaos testing in software engineering is the equivalent of crash tests in the automotive industry. It’s a very controlled way of looking at a system as a whole … and being able to experiment with a system to increase confidence,” says Mikolaj Pawlikowski, software engineering project lead and chaos engineering expert at Bloomberg. “We started doing this in early 2016 because we were working on a Kubernetes microservices platform to allow developers to deploy code quickly. At first, Kubernetes patches were flowing faster than we could deliver them … so we started writing scripts to simulate failures preemptively to gain confidence in our Kubernetes setup.”

mikolaj-pawlikowski-bloomberg — Mikolaj Pawlokowski

However, this “chaos” isn’t really chaos: it’s a carefully organized structure in which one component is something of a surprise attack that engineers must be sufficiently organized for firms to benefit from. “We’re trying to push a more scientific approach with more focus on tools and analysis, and less randomness. I think the industry as a whole is moving to that. But it does require a certain level of maturity … and you have to understand what something is expected to withstand before you try to break it.”

Fitch’s Ferguson is also a proponent of a tools-based approach. In addition to the other tools it uses to support its development and testing, Fitch Solutions has licensed a tool called Gremlin, an enterprise version of the “Chaos Monkey” developed by streaming video service Netflix.

“If you look at companies like Netflix, Facebook, Microsoft, and others, they are probably using or starting to use Chaos Testing. But in the financial industry, firms are probably only just getting started using this,” Ferguson says, acknowledging that Fitch itself is among these firms not yet ready to fully unleash chaos.

Currently, the vendor simulates a complete outage quarterly. Chaos testing—when configured with access to a company’s AWS account—will deliberately switch off parts of services hosted in AWS or reconfigure services to simulate the impact on network quality.

“Everything we have runs in the cloud,” Ferguson says. “If we took that to the next step, we would build it in a way that would be resilient to any outages. We’re not there yet, and chaos testing is the best way to do that. But we aren’t ready to introduce chaos testing right now as we don’t believe the results would be good. … That won’t happen in 2021—maybe the year after.”

One advantage of chaos testing is that it is the antithesis of an efficient development organization and the way engineers think about code: Code is logical and structured, and their approaches to testing are usually similarly logical and structured, so while they work well for testing expected issues, they don’t necessarily account for unpredictable events, whereas chaos is by nature unpredictable.

“You can be preemptive and try to figure out what might break, and test the way you want something to behave—it’s not rocket science. But when you introduce randomness, you can spot things you didn’t predict,” Pawlikowski says.

Testing Times

Of course, the ultimate random element was Covid-19, which wreaked havoc on firms’ and vendors’ activities throughout 2020. Pegasus’ Stephens warns that with budget restrictions and fewer resources available as a result of the pandemic, new technologies being implemented may not have been tested to the usual levels of rigor demanded by regulators, while not introducing new technologies may result in firms continuing to use legacy systems beyond their sell-by date.

“Specifically during Covid, banks’ focus has very much been on keeping the lights on, and a lot of new projects have stopped. That tells me that a lot of these teams are running very lean and don’t have the capacity to evaluate and test new services,” Stephens says.

This could impact startup vendors’ ability to test code, since some companies rely heavily on clients in their early phases to identify issues in their code while they are still developing functionality. For example, a vendor’s approach to testing can change significantly from its startup days to when it reaches a size and scale where it has the resources to establish a full testing group, says Matthew Storey, co-founder and chief product officer at regulatory and compliance technology provider SteelEye.

“In our early days, clients were flagging issues to us … and some clients still like to see early versions of the software,” Storey says, which not only provides extra sets of eyes on the software, but also brings the experience and understanding of client practitioners. “Our developers and engineers … might understand a bug in the code, but might not necessarily understand the challenge that the client is facing. … It’s important for us all to know what effect it has on a client if we release bad code.”

Sterling’s Nevotti also highlights how practices continuously evolve as a company matures and changes. “As you grow larger and gain more critical mass, you have more to spend on testing, and also you become more important and critical to clients, so you are held to higher standards compared to when you were a startup,” which requires more defined procedures and better testing tools, he says. “A couple of years ago, our testing team couldn’t test everything because they didn’t have the right tools, so we had to go out and invest in new technologies.”

Though SteelEye and Sterling are past that phase, other startups may find the current climate particularly challenging or costly—not just in terms of making sales during Covid, but also ensuring their code is as clean as it can be prior to entering clients’ production environments.

One thing’s for sure: The more time you invest in testing, the less time your systems are likely to be offline as a result of any errors in their code or issues with your infrastructure. Or, as Bloomberg’s Pawlikowski says, “The more you test during the day, the better you’ll sleep at night.”

Point Break: How Vendors Push Their Products to the Limit

Vendors don’t release new products or updates without putting their software through rigorous testing. What does that testing involve, and what different approaches do companies employ? Max Bowie finds out.

Need to know

Further reading

More on Emerging Technologies

This Week: Startup Skyfire launches payment network for AI agents; State Street; SteelEye and more

Waters Wavelength Podcast: Standard Chartered’s Brian O’Neill

SS&C builds data mesh to unite acquired platforms

Chevron’s absence leaves questions for elusive AI regulation in US

Reading the bones: Citi, BNY, Morgan Stanley invest in AI, alt data, & private markets

Startup helps buy-side firms retain ‘control’ over analytics

The IMD Wrap: With Bloomberg’s headset app, you’ll never look at data the same way again

LSEG unveils Workspace Teams, other products of Microsoft deal

You are currently on corporate access.