ASR Accuracy and Usability
In an earlier post A Word About Accuracy, we discussed various ASR accuracy methods used in requirements specifications and testing. Why word accuracy is a poor choice in simulation and command & control applications, was explained in the post “Word Error Rates are Misleading”. In this article, we will suggest an alternative method of specifying ASR accuracy that will better indicate the usability of your proposed system.
The Structure of the Language Model
For simulation and command & control applications, most if not all of the commands we might speak, have a simple structure. The commands include an instruction or information statement with an optional qualifying data component.
An example of an information statement for a voice-activated television is, What Channel is Playing? An instruction statement might be, Switch the Channel. It is obvious that in the switch example the instruction is incomplete without the channel information. The proposed method of accuracy specification is to measure the information/instruction component (the semantic) separately to the data component, e.g., The system must achieve a semantic accuracy of at least xx% and a data component accuracy of at least yy%.
The semantic accuracy is simple to define, was the output from the ASR a semantic match to the input? In the following examples, the ASR output may not match each of the speaker’s words, but the output from the ASR was as desired:
Speaker – Maintain a heading of two seven zero
ASR Output – Maintain heading two seven zero
The output from the ASR is a semantic match and is a heading command. In the next example, the data component is incorrect:
Speaker – Maintain heading two seven zero
ASR Output – Maintain heading two six zero
The output from the ASR is still a semantic match with the heading command, but we have to deal with the incorrect data component.
Data Component Accuracy
The primary measure of usability of an ASR enabled system is semantic accuracy. Data accuracy becomes irrelevant if the semantics are incorrect.
We can resolve some data component errors In human-in-the-loop simulations, through clarification and conversation (for more information see Here).
For command & control, e.g., a voice interface to an aircraft cockpit, data component accuracy can be critical. A critical system does not necessarily require 100% data component accuracy. In a system that provides voice input to an aircraft autopilot, we can implement data validation methods. The application of ASR in safety-critical applications is a separate topic. A human factors study In safety-critical domains will reveal how and when data validation is required.
Measuring the data component accuracy is again relatively simple. In the earlier examples using headings, we calculate the percentage of digits recognized correctly from the total digits in the test data set. However, consideration of what constitutes qualifying data is required. Using another heading command as an example:
Turn Left Heading two seven zero
The instruction given is still a heading command, but we have added the condition of Left. The heading command has a <digits> data component and a <direction> data component.
Don’t worry; this does not imply that we need to define in advance the qualifying components of each command we might issue. We can add to our specification the following definitions:
Digit Accuracy Data Component – The measure as a percentage of all correctly recognized digits in the test data
Qualifying Data Component – The measure of correctly recognized components that directly impact the semantic meaning of the instruction, e.g., left/right, up/down.
When combined with our earlier discussed accuracy requirements, we have a reasonable expectation of a usable system.
ASR accuracy specification in this way is a much-preferred approach for supported phrases but meaningless for unsupported phrases. Phrases not included in the original specification are unsupported phrases. It is easy to say that if phrases were not requested then let us ignore them. In many applications, e.g., those used for training; you cannot prevent the use of non-standard terminology. ASR enabled systems would produce unpredictable and unwanted consequences. How do we deal with unsupported phrases? Coming soon!