Sphinx4 migrated to git

This change started some time ago, but now it's mostly finished and announced. The tree could be found here:

https://sourceforge.net/scm/?type=git&group_id=257562

The discussion is here.

I'm glad to see the progress happens, big thanks to everyone involved - Joe, Piter and others.

About git itself I have a mixed feeling. The advantages of DVCS aren't obvious for me and in the past I even gave up my participation in one of the projects after it's migration to mercurial (it was http://linuxtv.org). Distributed nature increases complexity and confuses at least me. It's hard to understand where the latest changes are done, what is the real state of thing and where change happens. Developers tend to add their changes to their own branches and little effort is made to create a common branch. Also among all DVCSs git is the worst in terms of usability. Sadly GNOME also migrates to git in near future.

Every change has it black and white sides. Many things I do like in a new sphinx4 - clear split of the tests one can run. Some things are hard to understand like Rakefile migration. I'm afraid of windows users, how will they build sphinx4 now? Anyhow, let's hope issues will be resolved and the new shiny release will appear very soon.

Russian GNOME 2.26

Russian GNOME 2.26 is 100% translated. Congratulations to the team for their hard work.

GNOME Summer of code tasks

I spent some time today trying to invent some interesting tasks for GNOME summer of code 2009. My favorite list for now is:
  • Text summarizer in Epiphany
  • Improved spell check for GEdit
  • Doxygen support for gtk-doc
  • Desktop-wide services for activity registration
  • Automatic workstation mode detection and more AI tasks desktop can benefit from
  • Cleanup of the Evolution interface where sent and received mail are grouped together
The list is probably too boring, but one should note that usually summer is to small to implement something serious and students are not that experienced as one want to see them. Some of the tasks were rejected already, though it's not a big deal. I just find discouraging that the list of the tasks proposed officially is even more tedious.

The overview of this issue makes me think again about GNOME as a product on the market and the possible ways of it's development. It seems that we are now at a point when feature set among competitors are stabilized and it's hard to invent something else in a market. So-called mature product stage where it's important to polish and lower costs. The big step is required to shift product on a new level. Probably I need to investigate the research desktops that completely change the way users works with the system. For example I'd love to see better AI support everywhere like adaptive preferences, better stability and security with proper IPC and service-based architecture, the self-awareness services, the modern programming language. I'm not sure I'm brave enough for that though.

HTK 3.4.1 is released

Amazing news really. The new features of the release include:

1. The HTK Book has been extended to include tutorial sections on HDecode and discriminative training. An initial description of the theory and options for discriminative training has also been added.
2. HDecode has been extended to support decoding with trigram language models.
3. Lattice generation with HDecode has been improved to yield a greater lattice density.
4. HVite now supports model-marking of lattices.
5. Issues with HERest using single-pass retraining with HLDA and other input transforms have been resolved.
6. Many other smaller changes and bug fixes have been integrated.

The release is available on the HTK website


Building interpolated language model

Yo, with some little pocking around I managed to get a combined model from the database prompts and a generic model. The accuracy is jumped significantly.

Sadly cmuclmtk requires a lot of magic passes with the models to get lm_combine work. Many thanks to Bayle Shanks from voicekey to write a receipt. So if you want to give it a try:

  • Download voice-keyboard
  • Unpack it
  • Train both language models
  • Process them with the scripts lm_combine_workaround
  • Process both with lm_fix_ngram_counts
  • Create a weight file like this (the weights could be different of course):
first_model 0.5

second_model 0.5
  • Combine models with lm_combine.
After all the steps you can enjoy the good language model suitable for dialog transcription.

Building the language model for dialogs

I'm in search how to build a combined language model suitable for dialog decoding. I have quite a lot of dialog transcriptions, but they aren't comparable with generic model built from the large corpora from the view of the coverage. It would be nice to combine them somehow to get the structure of the first model and the diversity of the second one. In one article I read it's possible just to interpolate them lineary. So probably I just need to get closer in touch with SRILM toolkit

It's discouraging that sphinx4 doesn't support high-order n-grams. Another article mentions a solution for that to join some often word combinations into compound words.

Btw, generic model gives 40% accuracy while home-groun dialog model gives 60, so it's a promising direction anyhow.

Cleanup strategies for acoustic models

An interesting discussion goes on Voxforge about the cleanup of the acoustic database. It seems for me that we really different from the usual research acoustic models which are mostly properly balanced. We have a load of unchecked contributions with non-native accents and so on. But we still have to work with such database and get sensible models from them. Fisher experience showed that even loosely checked data could be useful for training. Although we aren't that nicely transcribed as Fisher, we still can be useful only if we'll apply specific training method that assume the nature of the data collected.

I tried to find some articles about training of the acoustic model on the uncomplete data, but it seems that most of such research is devoted to another domains like web classification. Web data by definition is incomplete and has errors. We could reuse their methods on unsupervised learning, but I failed to find information on this. Links are welcome.

Another interesting reading I had today is the performance of the Fisher database. Articles mention that the baseline is around 22% WER on 20xRT speed. 20xRT is unacceptably slow I think, but even with 5xRT we are close to this barrier. The thing that makes me wonder is that in sphinx4 beams make decoding slow but doesn't improve accuracy. It must be a bug I think.

Nexiwave in MIT100k

Congratulations to Ben and others seeing Nexiwave in MIT100k semifinal.

The great article about architecture management:

IEEE Computer society. March/April 2005 (Vol. 22, No. 2) pp. 19-27 Architecture Decisions: Demystifying Architecture


Behaviour guideline

Although GNOME HIG nicely specifies, how GNOME application should look, it's says nothing about how application should behave. It's strange that our usability guys don't take such an important thing into account. Even small difference in program execution affects user satisfaction. For example, Open dialog saves location in one applications but makes me browse to the same directory every time in another. The main problem we are talking in that section is that programs should behave consistently, not only look consistently.

And some consequences.

It's impossible to get consistent behaviour without sharing codebase. One can make application written in Qt or with Mozilla suite look like Gtk application, but user easily see the difference way such applications do things. Once you open settings dialog, mirage of consistensy dissapear. HIG should not be recomendations everyone is trying to follow, but a documentation to hardcoded rules, that anyone using library is automatically following.

Integration with other toolkits doesn't have any sense, as supporting software on different platform. If application uses another codebase it will behave differently. Take, for example, gecko or gtk-mozembedd applications. They all have problems with keyboard focus and accessibility. It's impossible to make them work like GNOME user expected. Even if you'll get them look similar, it's impossible to maintain such consistensy every time something changes in gtk.