Senone Tree Implementation For Sphinx4

I spent last month working on senone tree linguist for sphinx4 as a part of Nexiwave's sphinx4 performance project. Well, mostly I was fixing bugs in my initial implementation. The core idea of senone tree which was suggested to me by Bhiksha is the following. Lextree is a representation of all possible words in a dictionary which is built with triphones. Lextree is used to explore search space during decoding. There very good thing is that since number of HMMs is rather small comparing to the number of triphones (40000 vs 100000) the lextree is rather compact representation of the search space.

Senone tree takes advantage of the internal representation of each triphone. Indeed the triphone is built from 3 senones and number of senones is even smaller. It's just 10000 for a big model. And respectively we can transform our graph to way more compact structure. There are also some techical advantages of our senone tree implementation - efficient hash map that doesn't waste memory, simple search space structure since we don't use helper states like LexTreeEndUnitState for word end nodes and LexTreeNonEmittingHMMState for non-emitting states. Simple code is also an advantage.

Expermients also show that senone tree provides nice results. It's a little bit more accurate (few insignificant percents) and it is significantly faster (2x times faster in growth, 20% faster overall, 20k active tokens vs 40k before). That's a nice improvement for Nexiwave engine.

This work makes me think that we need to provide different answer on our core question: "How to improve the accuracy". Of course traditional ones like: optimize parameters and train better model, are still valid but they are secondary. Primary answer should be: implement proper features in engine and you'll get the accuracy improvement you need.


Post a Comment