Face Recognizers, Bloom filters and Application to Speech Recognition

In scientific paper waterfall we have today I continuously face the issue of  selection of high-level important approaches to the problem. Many ideas are definitely important and lead to accuracy improvement but they are certainly not counted as core ones. Like another feature extraction algorithm that could bring you 2% of performance improvement. I definitely miss some high-level up-to-date reviews that could lead into the world of possible approaches taken and their advantages and disadvantages. I was counting on books in that, but unfortunately they aren't as accessible as papers.

Some time ago I went into reading the core face detection paper by Viola and Jones about Haar cascades for object detection. It struck me that their method which appeared to be very fruitful in face and object detection didn't get into common practice in speech recognition.

Basically the idea of their method is that it's possible to reduce search space significantly with very weak set of classifiers. For example you can easily find out that there is no face on the green grass and thus you can skip this region. This is rather fruitful idea that you can classify negatives much more accurately then positives. Putting things into cascade make search space tiny and recognition fast and efficient. Certainly it's not the only algorithm of this type, other one I met recently is bloom filters with almost the same method for efficient hash search.

The transfer of this into ASR is rather straightforward. We need to train weak classifiers that reject phone hypothesis for a given set of frames. That's actually quite easy with SVM or something built on top of existing HMM segmentation. Next, we could also apply this to a language model and reject some hypothesis which aren't possible in the language.

I haven't seen any papers on that, probably I need to search more. This idea is certainly worth to try and it should get into common ASR practices like discriminative training, adaptation with linear regression or multipass search.


Post a Comment