Senone Tree Implementation For Sphinx4

I spent last month working on senone tree linguist for sphinx4 as a part of Nexiwave's sphinx4 performance project. Well, mostly I was fixing bugs in my initial implementation. The core idea of senone tree which was suggested to me by Bhiksha is the following. Lextree is a representation of all possible words in a dictionary which is built with triphones. Lextree is used to explore search space during decoding. There very good thing is that since number of HMMs is rather small comparing to the number of triphones (40000 vs 100000) the lextree is rather compact representation of the search space.




Speech Decoding Engines, Part 2. SCARF, The Next Big Thing In Machine Learning

It seems that HMM will not stay forever. If you aren't tied to speech and track big things in machine learning, you should hear about that new thing - Conditional Random Fields. According to recently started but very promising Metaoptimize, it's one of the most influental ideas in machine learning.

And, suprisingly, you can already apply this thing to speech recognition, thanks to Microsoft Research including Geoffrey Zweig, Patrick Nguyen. It's SCARF, a Segmental Conditional Random Field Speech Recognition Toolkit which is version 0.5 now. You can download it's sources from Microsoft Research Website.


Testing CMUSphinx with Hudson



As every high-quality product CMUSphinx spend a lot on testing. That isn't really trivial task because you need to make sure that all parameters that are important are improved or at least not regressed. That includes decoding accuracy, speed and API specs. Sometimes changes improve one thing and make other worse. Things are going to change with the deployment of continuous integration system Hudson.

Quite sophisticated system of tests was created to track changes. That included perl scripts, various shell bits, mysql database and even commits to CVS repository. It was also spamming mailing list all the day with long and unreadable emails. Another bad thing was that it's based on private commercial data like WSJ or TIDIGITS database but now everything is changing with Voxforge test set. Our goal is to let you test and optimize system yourself



HTK Competition Voting

Thanks everyone for your feedback, results are really interesting to see.

HTK competition is something that I was worrying about for a long time. One key issue that I see is that htk-users mailing list definitely has way more deep discussions about ASR than we have on our forum. Hopefully, situatuation will change.

Anyway, our goal is still to provide very accurate speech recognition and this is not yet solved task with many issues both in usability and accuracy. So we can definitely learn from each other and improve our projects.