Sphinx4 1.0 beta4 Is Released. What's next?

So, almost according to schedule, sphinx4 was released yesterday. Check the notes at


Most notable improvements were already discussed here, so let me try to plan what the next release will be. Trying to be realistic in plans, I don't want to promise everything at once. Here is some attempt to forecast the next release notes

The biggest issue with sphinx4 is actually documentation. Current poll on CMUSphinx website clearly shows that. Personally I sometimes think that perfect documentation will not help if system doesn't work, but at least it will make product attractive and easy to use. My idea is that we need to have more developer-level documentation - tutorial, examples, task-oriented howtos. It's unlikely we'll be able to write something that is good enough as textbook on speech technologies. But we need to prove the point that it's possible to build ASR system without understanding who is Welch.

On the code side, we face a biggest challenge since sphinx4 was designed. We need to move to the multipass system. It's not just about rescoring, it's about plugging diarization framework from LIUM, it's also about making sphinx4 suitable for both batch and live applications. That's the serious issue.

The reason is that currently sphinx4 architecture is flow-oriented. It's built like a single pipe of components each passing audio to other. This is good for live applications, but not so good for batch ones. You get troubles when you need to split pipe or merge it later. In batch application one could have a huge benefit from looking on recording as a whole and returning to recording multiple times. For example, you could estimate noise level properly and just cleanup audio on the second pass. Such multipass decoding doesn't well fit into pipe paradigm. On the other side, changing it to purely batch will create issues for live applications.

So we are in trouble. We have to invent some combined scheme probably and create a hybrid of pipe and batch approaches. I was thinking about knowledge base scheme when information about stream is stored in some database as processing goes. Database cleanup policies could emulate both pipe (when database is immediately cleaned) and batch approaches (when database is kept even over sessions). Festival utterances remind me such data processing scheme between. Anyway, this idea is not finalized yet.

We also expect to see a lot of movement from CMUSphinx Workshop in Dallas and in Google Summer of code participation. I hope issues described above and some more interesting issuses will be resolved till next release in August. Let's discuss the rest then!


Post a Comment