On CMUCLMTK

I've rebuilt the Nexiwave langauge models and meet some issues which would be nice to solve one day. CMU language model tookit is a nice simple piece of software but it definitely lacks many features which are required to build a good language model. So thinking about features language modelling toolkit can provide I created a list.


Decoding of Compressed Low-Bitrate Speech

I've spent some time on optimizing accuracy for 3gp speech recordings from mobile phones. 3gp is a container format used on most mobile devices nowdays with speech compressed using AMR-NB inside. Converted audio to AMR-NB and back, extracted PLP features and then trained few models on that. Result is not encouraging - accuracy is worse than stock model both on original and on compressed/decompressed audio. Not much worse but significanly worse.

Looks like traditional HMM issues like frame independency assumption play here which is confirmed by the papers I found. This paper is quite useful for example:

Vladimir Fabregas Surigué de Alencar and Abraham Alcaim. On the Performance of ITU-T G.723.1 and AMR-NB Codecs for Large Vocabulary Distributed Speech Recognition in Brazilian Portuguese

And this paper is good too:

Patrick Bauer, David Scheler, Tim Fingscheidt. WTIMIT: The TIMIT Speech Corpus Transmitted Over the 3G AMR Wideband Mobile Network

Need to research more on subject. Suprisingly there are only few papers on the subject, way less than on reverberation. It looks we have to build specialized frontend specifically targetted on decoding of low-bitrate compressed speech. Or we need to move to more robust features than PLP.

For now I would state the problem to develop a speech recognition framework to provide good accuracy on:
  • Unmodified speech
  • Noise-corrupted speech
  • Music-corrupted speech
  • Codec-corrupted speech
  • Long-distance speech
Good system should decode well in all cases.

Updates in SphinxTrain

Being tired to explain build issues over and over I found the passion to step in and start a sequence of major changes in SphinxTrain

  • Ported sphinxtrain to automake, development branch you can try is here: https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/branches/sphinxtrain-automake
  • Will increase SphinxTrain dependency on sphinxbase, unifying the duplicated sources.
  • Will make training use external SphinxTrain installation, no setup in training folder will be required, only configuration. All scripts will be in share and in libdir, they will be installed systemwide. To try a new version one will just need to change path to sphinxtrain.
  • Will modify scripts to be able to build and test the database using a single command. No possibility to miss anything!
  • Will include automation for language weight optimization on a development set, better model training scripts will do everything required.

I know Autotools aren't the best build system, but they are pretty straghtforward. More importantly, the tools will follow common Unix practices and thus will be easier to use and understand.

Comments are welcome!

P.S.

We've done a great progress on Nexiwave also. Check it out!

Backward Compatibility Issues

Just today I spent few hours trying to figure out why changed makeinfo version output broke binutils build. Well, it's an old bug but we all getting mad when backward compatibility breaks. Especially when it affects our software. Especially when we don't have time no passion to fix that. My complains raised to the roof or probably even higher.

Life is a strange thing. Right after that I went ahead a broke sphinx4 backward compatibility in model packaging (again!). Now models distributed with sphinx4 follow Sphinxtrain output format, all files are in the single folder, model definition is named simply "mdef" and there is feat.params. Things are very
straightforward:

[shmyrev@gnome sphinx4]$ ls models/acoustic/wsj

dict license.terms means noisedict transition_matrices
feat.params mdef mixture_weights README variances

It will certainly help to avoid confusion when new developers change the model, adapt the model or train their own one.

In the future I hope to get feat.params used better in order to automatically build frontend, derive feature extraction properties, hold metadata about model and similar things. Shiny future is getting closer.

I also removed RM1 model from the distribution. I don't think anybody is using it.

So please don't complain, let's better fix that until it's too late to fix. One day we'll get everything in place and we'll release final version sphinx4-1.0. And after that we'll certainly be backward-compatible. I really like Java and Windows because of their long-term backward-compatible policy. We can do even better.