Open Platform: Cash in on Your Cache─Using CPU Cache, Hybrid Models and Vectors to Ditch Database Delays

steve-graves

Only 15 years ago, mentioning a database management system (DBMS) that used main memory as its storage would elicit incredulous looks from IT managers or software developers. "How is that different from database caching?" they might dismissively respond, prompting the explanation that by eliminating disk-based storage, an in-memory database system (IMDS) could also jettison its cache management sub-system─since an IMDS keeps all records in RAM─and other latency-inducing trappings of traditional disk-based DBMSs.

IMDSs, and the fact that they perform significantly faster than disk-based DBMSs, are much better understood today. In fact, if media buzz and database vendors' product strategies are any indication, IMDSs are becoming the "new normal" for high-performance analytics and data-intensive real-time systems. This embrace of main memory as database storage is prompting trading organizations and developers to ask how database management technology can deliver further competitive advantage, now their data is in main memory.

Some of the most promising answers involve a shift from DRAM (direct random-access memory) to a different kind of memory: the CPU cache, a faster pool of memory that is integrated with the CPU itself, and situated right next to the processing cores. If time-sensitive calculations need data, having it in CPU cache is the best situation. Otherwise, the IMDS must fetch it from main memory. Fetching adds latency, partly due to the Front Side Bus (FSB) or Quickpath Interconnect (QPI) bottlenecks in and out of CPU cache. The CPU processes data two to four times faster than it can be delivered through these portals. Therefore, maximizing the amount of relevant data in CPU cache and minimizing fetches can accelerate database system performance.

Achieving this means revising basic DBMS design assumptions. Traditional DBMSs (including IMDSs) are row-oriented, organizing data into tables of rows and columns, and acting on stored data row by row. These DBMSs organize rows into a database page of input/output, which implicitly becomes the unit for transferring data in and out of CPU cache.

The problem with the row-based approach for capital markets is that the inputs for these operations are typically market data in the form of a time series (ticks, closing prices, etc.). And, critically, a time series usually occupies some columns within a database table. Say we want to analyze two years' of closing prices for a security. Closing price occupies one column, while symbol, opening price, volume and date occupy others in the table─so a row will have one element (closing price) that is needed, and four that aren't. Row-oriented processing will fetch entire rows of data into CPU cache, including this irrelevant 80 percent.

A column-oriented approach that assembles data into pages column-by-column avoids flooding the cache with unwanted data in this way. In the example above, a column of closing price data would be gathered in a database page, up to the page size limit, with each fetch bringing five times more closing price data into CPU cache than the row-based approach. A newer idea is to provide support for both column- and row-based data handling in the same database instance. Many capital markets applications manage naturally columnar data alongside "general" data that is suited to the traditional row-based approach, so a solution that supports hybrid data designs promises greater flexibility.

Another big opportunity for database system optimization lies in keeping market data entirely within the CPU cache while analyzing and/or transforming it. Traditional DBMSs or IMDSs manage interim processing results outside the CPU cache: market data is fetched into the CPU and processed, with the output materialized in main memory (e.g. in a temporary database table). If further processing is required─e.g., executing back-to-back statistical functions on market data─the database system queries the temporary table and brings the result into CPU cache for the next operation. Even if the data is handled in a columnar fashion, this back-and-forth trafficking of interim results adds latency.

A vector-based approach to handling market data can avoid these performance-sapping handoffs. With a DBMS that supports vector languages or vector-based functions, developers work with entire sequences (vectors) of market data as if they were scalar types (e.g. variables containing a single value). This can slash the amount of code required to perform tasks. Some CPUs also contain special capabilities for handling vectors, which these programming languages and functions can tap.

Vector-based languages do not, on their own, solve the issue of managing interim results. Typically, combined operations must execute over entire vectors before data is passed along for further processing, and these may be too large to be held in cache. In this case, the processor has to fetch chunks of each vector from─and write interim results to─main memory, meaning two trips across the slower QPI/FSB. Data transfer overhead is multiplied when chains of processing grow more complex. For example, in the example E = A * (B + C) / D, all the elements of the vector of B must be added to the corresponding elements of vector C, then that interim result vector's elements must be multiplied by the corresponding elements of the vector A. Finally that new interim result vector's elements must be divided by the corresponding elements of vector D, and only then can the result be stored in the vector E.

This limitation can be avoided with a new vector-based approach that keeps market data in CPU cache as it moves through a pipeline of statistical functions. Blocks of market data called "tiles" are assembled from vectors of market data, and are the unit of transfer into and out of cache and between functions within cache. With a library of pipelined, vector-based functions, this example can be rewritten in SQL as:

SELECT seq_div( seq_mul( A, seq_add( B, C ) ), D as E from SOMETABLE;

Processing is triggered by invocation of the first function in the pipeline, which brings a tile into CPU cache. When function "seq_add" finishes its work on the tile, its output (a tile of transformed data) is passed to function "seq_mul," and that output goes to function "seq_div," then a fully transformed tile of the result set E is transferred back to main memory, and a new tile can enter CPU cache, if needed, to complete an operation. Tiles from stages in the pipeline are replaced by "more transformed" tiles further down the line, eliminating the need to send interim results back into main memory. The entire series of operations executes on data that is pipelined from one function/operation to another; the interim results are never materialized; and the whole calculation occurs at the CPU's speed, not delayed by transfers across the QPI/FSB.

As market data volumes soar, technologists seek a "better mousetrap" database system to help identify profitable trading opportunities, curtail risk, and streamline the functioning of financial exchanges. Older ideas such as main memory data storage don't go away, but become the foundation on which new techniques are built.

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: http://subscriptions.waterstechnology.com/subscribe

You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.

Most read articles loading...

You need to sign in to use this feature. If you don’t have a WatersTechnology account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here