Semantics Takes Tech to Tango

Within the symbiotic relationship between technology and standards, Jamie Hyman investigates the exciting technological breakthroughs that are pushing standards toward industry-wide adoption, while the very survival of some technology—namely, blockchain—depends on the advancement of semantics and ontologies.

data-tango-idm1118

While the recent development of revolutionary semantic technologies provides a glimpse of data’s exciting future, the technological foundation for semantics and standards is both solid and ancient. 

More than 2,000 years ago, Greeks first imagined ontologies. The originator is believed to be Pythagoras, the mathematician and scientist who led the first discussions on how to categorize human existence back around 500 BC, and since then, the concept has been applied to a wide variety of knowledge representation. The present-day, cutting-edge technology used to implement ontologies for financial data management is built upon directed graphs, a mature science that can be traced to 1735 when a Swiss mathematician used it to solve an old puzzle about bridges and a river and an island—or more precisely, he used it to prove the puzzle could not be solved. 

Karla Mckenna
Karla McKenna, Citi

The final layer in financial ontologies’ foundation is the world wide web, invented by English engineer Sir Tim Berners-Lee about three decades ago. In 2009, Berners-Lee began working with the British government to make data more open and accessible. He argued that the standards developed for the internet could be built upon, to create semantic models of data, which is precisely what is happening among the financial industry’s leaders in the standards space. 

“The truth is, in many cases, the devil is in the technical details,” says Karla McKenna, director of market practice and standards for Citi Markets and Securities Services.

When tracing through this history of standards and ontologies, “the technology is the easy part,” says David Saul, senior vice president and chief data scientist at State Street.

david-saul-state-street
David Saul, State Street

“We are on an absolutely solid technical foundation here,” Saul says. “We now need people to start filling out those models, people who know what the data elements mean to work with industry associations within standards groups. That’s the hard part.”

Toppling the Tower of Babel

Bloomberg Enterprise Data delivers reference, pricing and regulatory data as semantic data to clients via the web.  Matthew Rawlings, chief data officer for Bloomberg Enterprise Data, says W3C—short for World Wide Web Consortium, the main international standards organization for the world wide web—technologies such as Resource Description Framework (RDF) and W3C Web Ontology Language (OWL) are “absolutely essential” for having standard-interpretable data, not only in finance, but across other industries, as well. 

“I think for delivery of that data, the linked data platform standards and the technologies that support that, they generally tend to be very simple technologies, but the standards are conventions governing how to use this stuff to keep things working together,” Rawlings says, a distinction that is key for Bloomberg, as the vendor delivers integrated reference data and pricing, “so our data comes with its own ontology.”

Rawlings was the original author of ISO 20022, a single standardization approach for financial services developed through the International Organization for Standardization. Jim Northey is a technical committee co-chair for the FIX Trading Community and this coming January, will become chair of Technical Committee (TC) 68, which authors, supports and maintains ISO 20022. The standard is currently under revision with the goal of adding semantic capability. 

jim-northey-2014-fix-lasalle
Jim Northey, FIX

“I’m looking at using semantic technology to supplement, support and improve the operational efficiency around our existing messaging structures,” Northey says, and he is also “looking at a more practical use of web semantics,” specifically, basing new semantics technology on web languages. 

“[TC 68] is looking to use these tools,” Northey says. “They seem to be much more powerful and you can do much more with them than, say, the previous generation of tools based on  the unified modeling language for traditional object-oriented design techniques that are very static in nature.”

McKenna, who is the current chair of ISO/TC 68, says many ISO 20022 users only see and are familiar with financial messages, “which are really the product of all of the knowledge and experts describing a process and the kind of information that needs to be present in order to transact business. The messages are important, but the strength of the model in the end is really what leads us to do the next-step semantic work.” 

A Portal to the Future

In some cases, the technological developments are catching up with data management concepts ahead of their time. 

“I tried to introduce semantics into ISO 20022 in 2004 and it worked, and we could prove it worked, but it was too early for most people to comprehend and adopt. It didn’t have enough support from the technology suppliers into the industry,” Rawlings says. “I think what’s changed now is we have enough of an ecosystem of tools and infrastructure and experience.”

 Part of TC 68’s work is to unify ideas with tools capable of implementation. 

“It’s great to talk conceptually about an ontology and a semantic representation of data. Now, we have technologies that are able to implement this and they’re able to do it in efficient ways. We can really talk about moving these things into production,” says State Street’s Saul. 

The multi-standard semantic portal may be the most mature example of technology thrusting semantics forward, to a place where results are both evident and within reach. 

“The ISO working group is normalizing multiple standards into a single semantic format derived from ISO 20022 so that they can compare and reason about the standards and discover new facts and relationships across the disparate standards which should help to converge and interoperate more effectively,” McKenna says. “And then TC 68 came up with their own tool based on open standards, which they’ve named the multi-standard semantic portal.”

Northey says the portal is the result of a fundamental shift in TC 68’s approach. 

“We were trying to evaluate semantics to see if they could be used to map between the various disparate standards and that was taking quite a bit of work. I don’t know if it was making any headway,” Northey says. Someone on the committee—Northey says he can’t remember who—suggested creating a SPARQL endpoint that would allow users to run and execute queries across standards. 

SPARQL is the language used to query semantic databases or databases that are represented in subject-object-predicate format,” he says. “What we wanted to do was try and get all of the standards into the same format, in a semantic format, so they were stored in the same model so that we could actually use SPARQL to apply questions across all of them.”

The portal is an open source stack of software where TC 68 stores data in a semantic database and uses semantic language to query it, with results coming back in one of three semantic formats.

Northey is cautious not to oversell the technology, saying the multi-standard semantic portal is not a rich semantic model with meaning and ontology. 

“We just took what we had and we put it into a semantic format because we think it’s easier to operate on and reason about. We think we’re going to make more progress toward conversion, doing it this way,” he says. “We changed out the toolset, and we think changing out the toolset to semantic tools will give us some capability.”

He adds that the technology is not perfect, noting that the stack is “a little brittle” and the commercial tooling capabilities are lagging. But State Street’s Saul says the multi-standard semantic portal is the real deal. 

“[The portal] is not one of those flashy new technologies that’s going to promise too much and not deliver,” he says. “I’ve been through enough of those to know this one’s different. It’s a solid foundation. That doesn’t mean it’s easy to go. It’s taking a lot of work, but the thousands of hours people are putting into it are starting to pay off and we’re starting to see the benefits.” 

McKenna agrees that the multi-standard semantic portal has huge potential for powerful insights. 

“Based on what they’ve learned and the experience they’ve gained from the multi-standard semantics portal, they then will write down these recipes for approaches for each of these standards and produce technical reports that others can use to establish this interoperability between ISO 20022 and a particular standard,” she says. 

Northey says while the tools continue to be a work in progress, they are market-ready. 

“My opinion is that companies should be using this stuff right now,” he says. “I know one of the major banks that actually uses this semantics processing to process a lot of their regulatory response and analyze documents and it’s amazing what they’re doing with the technology right now. I fundamentally believe it to be real. I think there are learning curves and I think there are tooling issues that have to be addressed, but if I were investing and if it were my money, I would be pursuing this path.” 

Northey says right now, TC 68 is working to get additional standards into a common semantic format and then bringing that commonality into ISO 20022, making it part of the ISO standards. 

“If we can get this common structure, to do comparisons and ask questions and do modeling, I think the output of that will be all of the standards will start to converge as opposed to diverge,” he says. “Then I think the next level [is the ability] to use artificial intelligence techniques or semi-autonomous learning to actually find relationships and discover concepts across, say, two or three different standards. That’s something we’re particularly excited about.”

A Study in Mutualism

Many of the relationships between technology and data in financial services are symbiotic and semantics follows that pattern: While underlying technology must be at a certain level of maturity to move ontologies forward, likewise, certain technologies rely on the advancement of standards to thrive. 

matthew-bastian2
Matthew Bastian, CUSIP

Matthew Bastian, director of market and business development and West Coast operations at CUSIP Global Services, says there has been significant progress as a result of pairing distributed ledger technology (DLT) with established reference data standards. For example, CUSIP is collaborating with Templum Markets to assign CUSIP identifiers to all tokenized asset offerings traded on Templum’s blockchain-based platform.

“Our view is that established standards can be the link to those other environments, and along the way, it might give a broader swath of market participants, as well as regulators, a better understanding or perhaps comfort level with the new technology. In a roundabout way, I think it’s that overriding concept that is going to ensure that standards still have a place in spite of the focus on new technologies like DLT,” Bastian says. 

He is referring to a theory that is occasionally floated, that blockchain will render all of this semantics progress obsolete, because distributed ledger technology eliminates the need for back-end processing. Bastian says the belief that technology can trump standards goes back at least as far as CUSIP’s founding. 

“We were formed during the paper crunch in the early days of computerized trading and at the time, you had different financial firms using their own internal codes for tracking securities in their own master files, holding, and all those post-trade activities that I described,” he says. “What everyone quickly realized was that nothing was interoperable. Carry that forward to today—if you have multiple smart security formats or different DLT schemes operating in parallel, it takes us back to that same problem. From our perspective, we’re really just trying to deliver that same standard and common language and allowing market participants from around the globe to communicate, even if the platform and its underlying technology are different.”

The core problem—that data built, named and managed in silos is incompatible front to back and across applications—is not solved because blockchain allows trades to be processed more quickly and irrefutably. 

“This data work that we’re trying to do, to get to really understand what an account is, or natural person identifier or legal entity identifier, and get that transmitted through the process front to back—those data issues and those identifier issues don’t go away just because you have blockchain,” Northey says. 

Blockchain faces the same data issues as traditional trades, he adds, and setting up different blockchains without standards or fundamental data structures simply creates new silos. 

“Putting erroneous data, or data that doesn’t match up with the pre-trade data in post-trade on a DLT isn’t going to be any more helpful,” he says. “DLT may give us a way of instantaneously recording something in an irrefutable manner that we all share, but the issues I’m talking about, with these data items through the life cycle—identifiers, data values and the meaning of those values—that problem doesn’t get solved by DLT.” 

Even if data (and its accompanying record) can be instantly created, the market still requires an understanding of the data so that machines can interpret it. 

“This ain’t going away with DLT. In fact, [the problem] could be exacerbated,” because incompatible data would be posted instantly, Northey says.  “[Semantics provides] a platform where we can get most of our standards all in the same format, so we can talk to them in the same way. That’s a real asset. We still have to have data standards across all these different distributed ledgers. The data’s the same.” 

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: http://subscriptions.waterstechnology.com/subscribe

You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.

‘Feature, not a bug’: Bloomberg makes the case for Figi

Bloomberg created the Figi identifier, but ceded all its rights to the Object Management Group 10 years ago. Here, Bloomberg’s Richard Robinson and Steve Meizanis write to dispel what they believe to be misconceptions about Figi and the FDTA.

Where have all the exchange platform providers gone?

The IMD Wrap: Running an exchange is a profitable business. The margins on market data sales alone can be staggering. And since every exchange needs a reliable and efficient exchange technology stack, Max asks why more vendors aren’t diving into this space.

Most read articles loading...

You need to sign in to use this feature. If you don’t have a WatersTechnology account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here