Speech Recognition Systems: A Luxury Traders Can Ignore?

THIS WEEK'S LEAD STORIES

Speech recognition is still in its infancy, but Reuters Holdings PLC believes the technology is sufficiently robust to meet the test of the trading room.

Others are less sanguine. "There is no commercially available technology capable of delivering acceptable performance levels in high-noise environments," says Dr. Victor Zue, leading light at the MIT speech communications group.

Zue made the remarks at a symposium on speech processing sponsored by MIT's Industrial Liaison Program. The invitation-only symposium was attended by research and development staff from Wang Labs, NYNEX, Reuters, Dun & Bradstreet, DEC, IBM, Motorola, and Matsushita. (One delegate listed his employer as "Code Z111 Naval Underwater Systems.") MIT has the largest industrial liaison program of any U.S. university, raking in over $34 million in corporate support in 1986.

Hewing to its professed role of technological innovator, Reuters will introduce the first commercial speech recognition application intended for trading. The product will be available as an option to users of the Reuter Dealer Trading System (RDTS), slated for introduction in 1988 (TST September 14).

Details of the implementation haven't been released, but Reuters has been showing the system to prospective users. Eyewitness accounts of these demonstrations, together with information provided by sources at Reuters, have exposed the functional outlines of RDTS's voice recognition offering.

In RDTS, speech recognition will be used both for data entry and command-control functions. A dedicated speech recognition processor will be attached to the terminal, relieving the CPU of any recognition-related work.

Input will be via a telephone receiver, which will be equipped with a switch that instantly engages the speech recognition function.

Voice data entry and commands will be integrated with a series of pull-down menus that can be activated by mouse. Traders will use such commands as "uptick," "downtick," "done," and "all out" to execute trades through RDTS.

The speech recognition application software will probably be installed on the RDTS terminal's hard disk and should occupy less than 640K of the two megabytes of RAM (expandable to 8) on the "IDR 386" box.

How It Works

Speech recognition systems accept spoken input via microphone or telephone, sample the analog input to generate representative digital signatures for each word or phoneme, and compare these signatures against a stored database of like signatures or "templates." When a match is made, the utterance can be interpreted either as data or as a command, depending on the context of the application within which the utterance is made.

Anything that can be done to limit the size of the database of speech templates against which incoming utterances are compared simplifies the matching task and reduces processing requirements.

Commercial speech processing systems are constrained by processing power and by the need for timely and accurate results. Each recognition exercise is run on a tight budget.

In order to help meet these budgetary demands, the RDTS speech recognition module limits the size of its vocabulary. The module is capable of recognizing only 100 words, and only half that number are used in the current version.

This may be adequate for the government securities and/or foreign exchange markets where the number of instruments traded is comparatively small. But the same size vocabulary would be grossly inadequate for identifying equities, which number in the thousands.

RDTS's speech recognition technology would be unable to distinguish between "IBM" and "IDN" for example.

The longest and most complicated voice recognition tasks required of RDTS will be deciphering utterance of complex numbers such as "two hundred twenty five thousand."

Does It Speak Brooklynese?

Speech recognition systems can be classified as either speaker- dependent or speaker-independent. Everyone says the same words a little bit differently -- in some cases a lot differently -- and a system trained to recognize all speakers sacrifices either vocabulary or accuracy.

RDTS uses the speaker-dependent approach because it yields more accurate and timely matching, and, possibly, because of the security features it enables. It is extremely difficult to fool a speaker- dependent system.

On the downside, a speaker-dependent recognition module must be periodically retrained by the user. Training requires the speaker to record each word in the system vocabulary two to four times, defining the database of templates against which it can match input.

Traders can record these training sessions on audio cassette and re-use them until they cease to be effective. The cassettes also allow traders to move from one workstation to another without repeating the initialization process.

Unfortunately a cold, a belly ache, or a simple increase in stress can alter an individual's speech enough to reduce matching accuracy. Likewise a three-martini lunch could render a trader's matching database worthless.

Traders are not noted for their patience, and they certainly won't sit still while a workstation digests their latest utterance. Nor will they be willing to pause before each carefully enunciated word -- whaddayakiddinme?!

It was essential, therefore, that the RDTS unit be capable of recognizing continuous speech in near real-time. Because pronunciation of the same word can vary according to its syntactic context, recognizing continuous speech demands more work than isolated words.

The accompanying figure (A) shows a series of spectrograms. The Y axis is frequency in kilohertz; the X axis is time; and amplitude is represented both by the darkness of the shaded areas and by the underlying waveform.

The top row of spectrograms represents a series of discrete utterances: the numbers 6-7-9-8-8-2. The lower spectrogram shows the same series of numbers spoken without pause. Note especially the difference between the discrete "8"s and the continuous "8"s caused by omission of the "t" sound in the continuous version.

Lower the Cone of Silence

One reason trading system engineers have shied away from speech recognition in the past is ambient noise. Much of the work done on ambient noise suppression has focused on the acoustic environment in the cockpit of a jet fighter. This type of noise is actually much more consistent -- and much less likely to be interpreted as speech -- than noise found in a trading room.

Reuters's solution is to rely on some sort of noise-limiting mechanism in the mouthpiece of the telephone receiver. Like a high- impedance microphone, this helps suppress sound that doesn't seem to emanate from close range of the receiver. Trial by fire is the only reliable means of testing this aspect of the system.

Even if the noise problem is licked, widespread usage of speech recognition may have unwelcome side effects. Information that once was input by mouse or keyboard or other manual device will now be entered by voice. Trading rooms, already noisy, will become noisier. Higher ambient noise levels could affect the performance of traders as well as the effectiveness of speech recognition.

Many suppliers of speech recognition systems distribute their products through OEM deals. Reuters is cooperating with one such supplier, although substantial customization has taken place.

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: http://subscriptions.waterstechnology.com/subscribe

You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.

‘Feature, not a bug’: Bloomberg makes the case for Figi

Bloomberg created the Figi identifier, but ceded all its rights to the Object Management Group 10 years ago. Here, Bloomberg’s Richard Robinson and Steve Meizanis write to dispel what they believe to be misconceptions about Figi and the FDTA.

Where have all the exchange platform providers gone?

The IMD Wrap: Running an exchange is a profitable business. The margins on market data sales alone can be staggering. And since every exchange needs a reliable and efficient exchange technology stack, Max asks why more vendors aren’t diving into this space.

Most read articles loading...

You need to sign in to use this feature. If you don’t have a WatersTechnology account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here