The previous post introduced the topic of unsupported phrases also called out-of-grammar (OOG). The Impact of Unsupported Phrases. The post explained that OOG is not specifically a speech recognition problem. However, serious thought and consideration of dealing with out-of-grammar inputs are vital. The consideration should include where can be OOG be most damaging in your application? and how should poor OOG management be mitigated? Effective OOG handling can be the difference between a usable and unusable connected application.
In a constrained grammar application, the ASR will only find a match against previously defined terminology. Any test that involves open free speech will lead to unquantifiable and invalid accuracy results. Mathematical accuracy is a valuable starting point. Traditional accuracy measures do not represent the performance of the system in use outside controlled test conditions, . By definition, it is not possible to use traditional accuracy scores to determine the system’s ability to handle OOG utterances. We can, however, provide guidelines and objectives that can be used in your system requirements definition.
Primary Goal of the ASR
The speech recognition system shall always produce a response that appears appropriate given the context. The application should not act upon recognized instructions if those instructions would not be executed in a real-world equivalent. A pilot on a short final approach is not likely to accept an instruction to report left base, without questioning the instruction. The simulator and speech system should exhibit the same logic.
While the effectiveness of this primary goal will be difficult to measure, it is useful as a statement of intent. It is important to understand that protecting against the effects of OOG can be a seemingly never-ending and potentially expensive activity, the primary goal can serve as a useful framework as you consider just how much protection you need. It is possible to define requirements in a generalized manner that can be tested and measured.
System Criteria for OOG Management
System criteria should contribute to increasing system robustness, usability, and user satisfaction.
These criteria are:
The system shall attempt to eliminate false-positive recognition. A false positive occurs when the speech engine accepts a match against a defined phrase that is not the phrase that was spoken by the user. This is often caused by the use of OOG phrases. A sensible measure for this requirement is semantic accuracy as described in this post, How to Specify ASR Accuracy.
The system shall not permit the processing of false positives by the simulator if such action has an unrealistic and negative impact on the traffic situation. Several available techniques can assist in meeting this criterion. For example, N-Best results and post-processing using an agent behavior system can be combined to produce a very effective method of trapping bad results. System effectiveness is measured by the definition of suitable metrics. However, the value of the pass / fail metric will be dependent on many factors including budget and the type of simulation system. Arguably in some domains, the impact of false positives is not as serious as in others.
Some false positives may be acceptable if a similar situation also happens in real-world communications. However, a false positive that happens consistently for the same situation is not acceptable. False positives should be considered a rare exception and not the rule.
Appropriate Text-To-Speech Responses
The synthetic entities shall respond in a manner that seems natural to the user should an invalid request be recognized, whether the invalid request is the user or speech recognition mistake.
Partial Recognition Results
The system shall be capable of processing partial recognition results where an utterance contains more than one instruction, e.g., In pilot training or air traffic control, it is important for usability that the system can handle the recognition of the callsign separate for the issued phrases.
OOG Inputs Summary
Dealing with out-of-grammar inputs outside of the core speech recognition capabilities is arguably the most important consideration when defining speech recognition requirements. Unfortunately, it is also a requirement that can be difficult to specify. If you were to develop a training system with 100% word accuracy, it is still far from a guarantee of a functional simulator. Performance measuring in acceptance testing is heavily weighted in favor of the vendor. Unsupported phrases and their impact are typically eliminated from test results. It is worthwhile and necessary to consider OOG handling very carefully and additionally it is an area that warrants the financial investment.