In earlier posts, we discussed how we might specify and measure speech recognition accuracy when using a constrained grammar model. The definition of the required phrases was also explored. If we could trust the users of our system only to use the supported phrases, we would have an almost perfectly functioning system. In this post, we will look at unsupported speech phrases and their likely impact on the system.
Unfortunately, users do not use only defined sentences. The use of unsupported speech phrases is common for training systems where new students are not familiar with the terminology and also when experienced operators in mission rehearsal systems have developed commonly adopted “unofficial” terminology. The use of unsupported speech phrases (also called Out-of-Grammar or OOG) has undoubtedly the biggest impact on system usability. Although OOG problems are not technically a speech science problem, they certainly need to be addressed when specifying speech recognition support for your proposed system.
Unsupported Speech Phrases, The Resulting Outcomes
There are three possible outcomes of the use of unsupported speech phrases. Two of the possible outcomes will likely have a minimal or no impact on the simulation, i.e., a correct semantic match or an utterance rejection. The difference between a functional system and an unusable system is largely predicated by how the system handles false-positives.
Correct Semantic Match
In many cases, the inclusion or addition of words will result in an unsupported speech phrase, but will still produce a correct semantic match. In other words, the nature of constrained grammar is such that “close enough” will be good enough. Refer to How to Specify ASR Accuracy for further details on semantic accuracy.
Utterance Rejection
If the recognized utterance is not “close enough,” the sentence will be rejected. In this use case, there will be no impact on the connected simulation. However, you should consider how you will handle any text to speech response when unsupported speech phrases are rejected. In most cases, the most appropriate response from the text to speech system for a rejected utterance would be silence. Silence is likely the correct response when rejecting the entire utterance. In many systems, an utterance includes multiple linked phrases.
If the target entity identification was recognized correctly, but the command intended for that entity was not, a response from the entity asking for clarification would be expected. The requirements specification for handling rejected phrases in a multiple phrase sentence should be a major design consideration. Using slot-based recognition may add cost, but it will if implemented correctly, significantly enhance usability and realism.
False Positives
A false positive error is a result whereby the speech recognition system incorrectly matches the utterance with a supported phrase. A false positive is most common when the system grammar has several similar phrases. Incorrectly set parameters e.g., confidence scores also lead to a false-positive result. A confidence score determines if an ASR result should be accepted or rejected.
In an example with a very simple grammar of one supported phrase, the following would likely result in a false positive.
Supported phrase – American eleven twenty-three, cleared to land
Spoken Phrases – American eleven twenty-three cleared for take-off
Given that we have only one supported phrase in this example, the ASR will return a false positive. This assumes we do not aggressively set the confidence score threshold for rejection
False positives processing will lead to the synthetic entity executing commands that are not contextually reasonable. Processing false positives often result in an out of control simulation scenario that was no fault of the system user. Additionally, processing a false positive has a significant negative impact on realism. In most real-world situations, the recipient of the instruction will not blindly accept that instruction if it makes no sense. Note that false positives can occur with supported phrases in poorly performing speech systems. False positives in supported and unsupported phrases are both equally damaging.
Requirements Specification for Unsupported Speech Phrases
Now that we have discussed unsupported speech phrases, a discussion of how to specify OOG handling is necessary. There are several considerations, and each will add cost to the acquisition of your system. This free download discusses the topic and others in greater detail Speech Recognition Requirements