No More Word Error Rate


What he said:

Hi Brad, it's Mike. I had a lunchtime appointment go long and I am bolting back to Evans. I'll be there shortly. See you soon. Thanks.

What Google Voice heard:

That it's mike. I had a list of women go a long and I am old thing. Back evidence. I'll be there for me to you soon. Thanks.

The interesting thing is that it got 17 out of the 26 words right--but those 17 words convey almost none of the information in the message...

I found this paper

Is Word Error Rate a Good Indicator for Spoken Language Understanding Accuracy
Ye-Yi Wang and Alex Acero

It is a conventional wisdom in the speech community that better speech recognition accuracy is a good indicator for better spoken language understanding accuracy, given a fixed understanding component. The findings in this work reveal that this is not always the case. More important than word error rate reduction, the language model for recognition should be trained to match the optimization objective for understanding. In this work, we applied a spoken language understanding model as the language model in speech recognition. The model was obtained with an example-based learning algorithm that optimized the understanding accuracy. Although the speech recognition word error rate is 46% higher than the trigram model, the overall slot understanding error can be reduced by as much as 17%.

We definitely need to address it in sphinx4.

No comments:

Post a Comment