Open Platform: Cash in on Your Cache─Using CPU Cache, Hybrid Models and Vectors to Ditch Database Delays
Only 15 years ago, mentioning a database management system (DBMS) that used main memory as its storage would elicit incredulous looks from IT managers or software developers. "How is that different from database caching?" they might dismissively respond, prompting the explanation that by eliminating disk-based storage, an in-memory database system (IMDS) could also jettison its cache management sub-system─since an IMDS keeps all records in RAM─and other latency-inducing trappings of traditional disk-based DBMSs.
IMDSs, and the fact that they perform significantly faster than disk-based DBMSs, are much better understood today. In fact, if media buzz and database vendors' product strategies are any indication, IMDSs are becoming the "new normal" for high-performance analytics and data-intensive real-time systems. This embrace of main memory as database storage is prompting trading organizations and developers to ask how database management technology can deliver further competitive advantage, now their data is in main memory.
Some of the most promising answers involve a shift from DRAM (direct random-access memory) to a different kind of memory: the CPU cache, a faster pool of memory that is integrated with the CPU itself, and situated right next to the processing cores. If time-sensitive calculations need data, having it in CPU cache is the best situation. Otherwise, the IMDS must fetch it from main memory. Fetching adds latency, partly due to the Front Side Bus (FSB) or Quickpath Interconnect (QPI) bottlenecks in and out of CPU cache. The CPU processes data two to four times faster than it can be delivered through these portals. Therefore, maximizing the amount of relevant data in CPU cache and minimizing fetches can accelerate database system performance.
Achieving this means revising basic DBMS design assumptions. Traditional DBMSs (including IMDSs) are row-oriented, organizing data into tables of rows and columns, and acting on stored data row by row. These DBMSs organize rows into a database page of input/output, which implicitly becomes the unit for transferring data in and out of CPU cache.
The problem with the row-based approach for capital markets is that the inputs for these operations are typically market data in the form of a time series (ticks, closing prices, etc.). And, critically, a time series usually occupies some columns within a database table. Say we want to analyze two years' of closing prices for a security. Closing price occupies one column, while symbol, opening price, volume and date occupy others in the table─so a row will have one element (closing price) that is needed, and four that aren't. Row-oriented processing will fetch entire rows of data into CPU cache, including this irrelevant 80 percent.
A column-oriented approach that assembles data into pages column-by-column avoids flooding the cache with unwanted data in this way. In the example above, a column of closing price data would be gathered in a database page, up to the page size limit, with each fetch bringing five times more closing price data into CPU cache than the row-based approach. A newer idea is to provide support for both column- and row-based data handling in the same database instance. Many capital markets applications manage naturally columnar data alongside "general" data that is suited to the traditional row-based approach, so a solution that supports hybrid data designs promises greater flexibility.
Another big opportunity for database system optimization lies in keeping market data entirely within the CPU cache while analyzing and/or transforming it. Traditional DBMSs or IMDSs manage interim processing results outside the CPU cache: market data is fetched into the CPU and processed, with the output materialized in main memory (e.g. in a temporary database table). If further processing is required─e.g., executing back-to-back statistical functions on market data─the database system queries the temporary table and brings the result into CPU cache for the next operation. Even if the data is handled in a columnar fashion, this back-and-forth trafficking of interim results adds latency.
A vector-based approach to handling market data can avoid these performance-sapping handoffs. With a DBMS that supports vector languages or vector-based functions, developers work with entire sequences (vectors) of market data as if they were scalar types (e.g. variables containing a single value). This can slash the amount of code required to perform tasks. Some CPUs also contain special capabilities for handling vectors, which these programming languages and functions can tap.
Vector-based languages do not, on their own, solve the issue of managing interim results. Typically, combined operations must execute over entire vectors before data is passed along for further processing, and these may be too large to be held in cache. In this case, the processor has to fetch chunks of each vector from─and write interim results to─main memory, meaning two trips across the slower QPI/FSB. Data transfer overhead is multiplied when chains of processing grow more complex. For example, in the example E = A * (B + C) / D, all the elements of the vector of B must be added to the corresponding elements of vector C, then that interim result vector's elements must be multiplied by the corresponding elements of the vector A. Finally that new interim result vector's elements must be divided by the corresponding elements of vector D, and only then can the result be stored in the vector E.
This limitation can be avoided with a new vector-based approach that keeps market data in CPU cache as it moves through a pipeline of statistical functions. Blocks of market data called "tiles" are assembled from vectors of market data, and are the unit of transfer into and out of cache and between functions within cache. With a library of pipelined, vector-based functions, this example can be rewritten in SQL as:
SELECT seq_div( seq_mul( A, seq_add( B, C ) ), D as E from SOMETABLE;
Processing is triggered by invocation of the first function in the pipeline, which brings a tile into CPU cache. When function "seq_add" finishes its work on the tile, its output (a tile of transformed data) is passed to function "seq_mul," and that output goes to function "seq_div," then a fully transformed tile of the result set E is transferred back to main memory, and a new tile can enter CPU cache, if needed, to complete an operation. Tiles from stages in the pipeline are replaced by "more transformed" tiles further down the line, eliminating the need to send interim results back into main memory. The entire series of operations executes on data that is pipelined from one function/operation to another; the interim results are never materialized; and the whole calculation occurs at the CPU's speed, not delayed by transfers across the QPI/FSB.
As market data volumes soar, technologists seek a "better mousetrap" database system to help identify profitable trading opportunities, curtail risk, and streamline the functioning of financial exchanges. Older ideas such as main memory data storage don't go away, but become the foundation on which new techniques are built.
Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.
To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: http://subscriptions.waterstechnology.com/subscribe
You are currently unable to print this content. Please contact info@waterstechnology.com to find out more.
You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.
Copyright Infopro Digital Limited. All rights reserved.
As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (point 2.4), printing is limited to a single copy.
If you would like to purchase additional rights please email info@waterstechnology.com
Copyright Infopro Digital Limited. All rights reserved.
You may share this content using our article tools. As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (clause 2.4), an Authorised User may only make one copy of the materials for their own personal use. You must also comply with the restrictions in clause 2.5.
If you would like to purchase additional rights please email info@waterstechnology.com
More on Emerging Technologies
This Week: Startup Skyfire launches payment network for AI agents; State Street; SteelEye and more
A summary of the latest financial technology news.
Waters Wavelength Podcast: Standard Chartered’s Brian O’Neill
Brian O’Neill from Standard Chartered joins the podcast to discuss cloud strategy, costs, and resiliency.
SS&C builds data mesh to unite acquired platforms
The vendor is using GenAI and APIs as part of the ongoing project.
Chevron’s absence leaves questions for elusive AI regulation in US
The US Supreme Court’s decision to overturn the Chevron deference presents unique considerations for potential AI rules.
Reading the bones: Citi, BNY, Morgan Stanley invest in AI, alt data, & private markets
Investment arms at large US banks are taken with emerging technologies such as generative AI, alternative and unstructured data, and private markets as they look to partner with, acquire, and invest in leading startups.
Startup helps buy-side firms retain ‘control’ over analytics
ExeQution Analytics provides a structured and flexible analytics framework based on the q programming language that can be integrated with kdb+ platforms.
The IMD Wrap: With Bloomberg’s headset app, you’ll never look at data the same way again
Max recently wrote about new developments being added to Bloomberg Pro for Vision. Today he gives a more personal perspective on the new technology.
LSEG unveils Workspace Teams, other products of Microsoft deal
The exchange revealed new developments in the ongoing Workspace/Teams collaboration as it works with Big Tech to improve trader workflows.