How To Choose Embedded Speech Recognizer

There are quite many solutions around to build an open source speech recognition system for low-resource device and it's quite hard to choose. For example you need a speech recognition system for a platform like Raspberry Pi and you consider between HTK, CMUSphinx, Julius and many other implementations.

In order to make an informed decision you need to consider a set of features specifically required to run speech recognition in a low-resource environment. Without them your system will probably be accurate but it also will consume too much resources to be useful. Some of them are:

Features for the small memory footprint:
  •  Support for a semi-continuous models
  •  Quantized and pruned data structures, mixture weights quantized to 4 bits and pruned, acoustic scores are quantized to 16 bits.
  •  Fixed pointer arithmetics
  •  Bitvector structures
Features for the fast computation:
  •  Top gaussian selection
  •  Simplified lextree search without cross-word context
  •  Multipass processing with tunable performance on each step
  •  Cache access optimization for increased memory throughput
  •  Downsampling
  •  Phone lookahead
Support for a popular mobile platforms:
  • Out-of-box support for Android
  • Out-of-box support for IPhone
  • Out-of-box support for embedded Linux systems like Beagleboard
And quite many other features which are helpful for speech recognition. Except commercial engines the only engine which implements the features above is Pocketsphinx

You can learn more about pocketsphinx features from the publication:

You can learn how to optimize Pocketsphinx for a low-resource environment from the wiki page:

Training acoustic models for embedded device also has some specifics which are required for Pocketsphinx, so Sphinxtrain is an optimal solution here.

There are also demos for Android and IPhone

No comments:

Post a Comment