The Impacts of Unsupported Speech Phrases

Unsupported Speech Phrases

In earlier posts, we have discussed how we might specify and measure speech recognition accuracy when using a constrained grammar model for simulation applications. We also explored how to define the required phrases to be supported. If we could trust the users of our system only to use the supported phrases, we would have an almost perfectly functioning system. Unfortunately, in many cases, users do not use only the defined sentences. The use of unsupported speech phrases, also called Out-of-Grammar or OOG, is true not only for training systems where new students are not familiar with the terminology but also when experienced operators in mission rehearsal systems have developed commonly adopted “unofficial” terminology. Use of unsupported speech phrases has undoubtedly the biggest impact on system usability. Although OOG problems are not technically a speech science problem, they certainly need to be addressed when specifying speech recognition support for your proposed system.

 

The Impacts of Unsupported Speech Phrases

There are three possible outcomes of the use of unsupported phrases. Two of the possible outcomes will likely have a minimal or no impact on the simulation, i.e., a correct semantic match or an utterance rejection. How the third outcome, a false positive, is handled will be the difference between a functional system and an unusable system.

Correct Semantic Match

In many cases, the inclusion or addition of words will result in an unsupported speech phrase, but will still produce a correct semantic match. In other words, the nature of constrained grammar is such that “close enough” will be good enough. Refer to How to Specify ASR Accuracy for further details on semantic accuracy.

Utterance Rejection

If the recognized utterance is not “close enough,” the sentence will be rejected, and there will be no impact on the connected simulation. However, consider how you will handle any text to speech response in this situation. In most cases, the most appropriate response from the text to speech system for a rejected utterance would be silence. Silence is likely the correct response when rejecting the entire utterance. In many systems, an utterance includes multiple linked phrases. If for example, we are confident that the target entity identification was recognized correctly, but the command intended for that entity was not, a response from the entity asking for clarification would be expected. The requirements specification for handling rejected phrases in a multiple phrase sentence should be a major design consideration. Using slot-based recognition may add cost, but it will if implemented correctly, significantly enhance usability and realism.

False Positives

A false positive error is a result whereby the speech recognition system incorrectly matches the utterance with a supported phrase. A false positive is most common when the system grammar has several similar phrases., or when speech recognition settings such as confidence scores are incorrectly defined (A confidence score is widely used to determine which when to accept an utterance as correct and when to reject an utterance).

In an example with a very simple grammar of one supported phrase, the following would likely result in a false positive.

Supported phrase – American eleven twenty-three, cleared to land

Spoken Phrases – American eleven twenty-three cleared for take-off

Given there is only one supported phrase, it is highly likely, assuming we do not aggressively set the confidence score threshold for rejection, that the speech recognition system will return a false positive with the supported phrase.

False positives if processed by the connected simulator will lead to the synthetic entity executing commands that are not reasonable within the context of the scenario often leading to an out of control situation that was no fault of the system user. Additionally, processing a false positive has a significant negative impact on realism. In most real-world situations, the recipient of the instruction will not blindly accept that instruction if it makes no sense. Note that false positives can occur with supported phrases in poorly performing speech systems. False positives in supported and unsupported phrases are both equally damaging.

Specifying OOG handling requirements

Now that we have discussed unsupported speech phrases, a discussion of how to specify OOG handling is necessary. There are several considerations, and each will add cost to the acquisition of your system. The next article will delve more deeply into the various options.

Please follow and like us: