Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the imagemagick-engine domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /home/verbyx5/public_html/wp-includes/functions.php on line 6121

Warning: Cannot modify header information - headers already sent by (output started at /home/verbyx5/public_html/wp-includes/functions.php:6121) in /home/verbyx5/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1896

Warning: Cannot modify header information - headers already sent by (output started at /home/verbyx5/public_html/wp-includes/functions.php:6121) in /home/verbyx5/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1896

Warning: Cannot modify header information - headers already sent by (output started at /home/verbyx5/public_html/wp-includes/functions.php:6121) in /home/verbyx5/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1896

Warning: Cannot modify header information - headers already sent by (output started at /home/verbyx5/public_html/wp-includes/functions.php:6121) in /home/verbyx5/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1896

Warning: Cannot modify header information - headers already sent by (output started at /home/verbyx5/public_html/wp-includes/functions.php:6121) in /home/verbyx5/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1896

Warning: Cannot modify header information - headers already sent by (output started at /home/verbyx5/public_html/wp-includes/functions.php:6121) in /home/verbyx5/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1896

Warning: Cannot modify header information - headers already sent by (output started at /home/verbyx5/public_html/wp-includes/functions.php:6121) in /home/verbyx5/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1896

Warning: Cannot modify header information - headers already sent by (output started at /home/verbyx5/public_html/wp-includes/functions.php:6121) in /home/verbyx5/public_html/wp-includes/rest-api/class-wp-rest-server.php on line 1896
{"id":47,"date":"2019-04-26T13:08:49","date_gmt":"2019-04-26T13:08:49","guid":{"rendered":"https:\/\/verbyx.com\/blog\/?p=47"},"modified":"2020-08-24T12:16:03","modified_gmt":"2020-08-24T17:16:03","slug":"speech-recognition-accuracy","status":"publish","type":"post","link":"https:\/\/verbyx.com\/2019\/04\/26\/speech-recognition-accuracy\/","title":{"rendered":"Speech Recognition Accuracy Measures"},"content":{"rendered":"
\n
\n
\n
\n

Speech Recognition Accuracy Measures<\/h2>\n

In this first post on speech recognition accuracy measures, we will describe the common methods for measuring error\/accuracy rates. In a follow-up post, we will continue to discuss why those error rate measurements might not be a practical way to describe your simulation ASR requirements.<\/p>\n

Word Accuracy<\/h2>\n

The most commonly quoted metric of speech recognition accuracy measures is Word Accuracy (Wacc).\u00a0 The number of correctly recognized words from the total number of words spoken represents Wacc. It can be a useful measurement when validating language models. Word Accuracy is calculated by comparing the returned sentence with the known accurate transcription of the test audio.<\/p>\n

\"Warr
The formula for Calculating Word Accuracy<\/figcaption><\/figure>\n

The\u00a0 (Wacc) performance is then evaluated using the following formula:<\/p>\n

Where:<\/p>\n

W indicates the total number of words to be recognized.<\/p>\n

Ws is the number of words that were substituted (the result replaced a word in the transcription with another word).<\/p>\n

Wi is the number of words inserted (a word was added that was not in the transcription).<\/p>\n

Wd is the number of words that were deleted (a word that was in the transcription was missing from the result).<\/p>\n

Digits Accuracy<\/h2>\n

In many applications, Digit Accuracy is a very important metric when specifying speech recognition accuracy measures. Digit accuracy only considers the numbers in a sentence. It calculated in an identical manner to word accuracy. In some applications, it can be difficult to achieve high digit accuracies. If digits are important to your application, make sure that you specifically ask about digit performance.<\/p>\n

Sentence Accuracy<\/h2>\n

Sentence accuracy has limited value when defining speech recognition accuracy measures.<\/p>\n

When measuring the sentence accuracy, the returned sentence (hypothesis) is compared against the spoken audio (reference). If the hypothesis does not match the reference exactly then the Sentence Accuracy is 0%.<\/p>\n

For example, the user says, “I would like to speak to<\/strong> the accounting department please” and the ASR returns. “I would like to speak with<\/strong> the accounting department please”. The ASR in this example returned a hypothesis almost identical to the reference, but this is still considered a sentence error. Furthermore, in this example there was a single word incorrectly recognized from a total of 10, resulting in a 90% word accuracy.<\/p>\n

Note that a true measure of error rates should be calculated using many test utterances, comprised of a wide assortment of speakers. The results from a single utterance are not truly representative of overall performance. Additionally, if the speaker uses words or phrases that are not part of the defined grammar, the error for that word or phrase is removed from the score.<\/p>\n

Speech Recognition Accuracy – Semantic Errors<\/h2>\n

Arguably the most useful measurement of accuracy in a command and control application is a semantic error (or command error). The Semantic error determines whether the desired end result occurred in the application.<\/p>\n

A semantic error is useful in validating application design. For example, the user said “YES” the ASR returned “YES”, and the “YES” action was executed, it is clear that the desired result was achieved.<\/p>\n

What happens if the ASR returns text that does not exactly match the utterance? For example, the user said “NOPE”, the ASR returned “NO”, and the “NO” action was executed. Semantically this should also be considered a successful dialog? If you measure performance using WER or SER this example will result in a poor score but in practice, the ASR produced the desired hypothesis.<\/p>\n

What does Speech Recognition Accuracy mean for you?<\/h2>\n

When trying to determine the suitability of one ASR over another, speech recognition accuracy measures are an important indicator of the expected performance. When evaluating at accuracy, think carefully about what measure is being used; how those measures may apply to your specific application and what constraints were in place during testing, e.g. a 1% error in a 100 name telephone directory, using a context-free grammar, is easier to achieve than a 5% error using a statistical language model for free speech applications. ASR accuracy\/error rates alone are not enough to determine the suitability of one vendor over another. Be wary of any vendor that tries to convince you based on their headline numbers alone.<\/p>\n

Free Download<\/h2>\n

This Free Download<\/a> <\/span>provides additional guidance on the topic. It also includes advice for writing general requirements for the purchase of a speech recognition system.<\/p>\n

P.S. Just in case you choose to ignore this advice, VRX is 99.9999% accurate! \ud83d\ude0a<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"

Speech Recognition Accuracy Measures In this first post on speech recognition accuracy measures, we will describe the common methods for measuring error\/accuracy rates. In a follow-up post, we will continue to discuss why those error rate measurements might not be a practical way to describe your simulation ASR requirements. Word Accuracy The most commonly quoted […]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"default","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"default","adv-header-id-meta":"","stick-header-meta":"default","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[1],"tags":[8,11,10,6,2,7,9],"class_list":["post-47","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-asr","tag-errors","tag-ser","tag-simulation","tag-simulator","tag-training","tag-wer"],"_links":{"self":[{"href":"https:\/\/verbyx.com\/wp-json\/wp\/v2\/posts\/47","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/verbyx.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/verbyx.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/verbyx.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/verbyx.com\/wp-json\/wp\/v2\/comments?post=47"}],"version-history":[{"count":0,"href":"https:\/\/verbyx.com\/wp-json\/wp\/v2\/posts\/47\/revisions"}],"wp:attachment":[{"href":"https:\/\/verbyx.com\/wp-json\/wp\/v2\/media?parent=47"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/verbyx.com\/wp-json\/wp\/v2\/categories?post=47"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/verbyx.com\/wp-json\/wp\/v2\/tags?post=47"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}