Adaptation Methods

It's really hard to collect information on practical application of speech recognition tools. For example the wonderful quote from Andrew Morris on htk-users about what to update during MAP adaptation:

Exactly what it is best to update depends on how much training data you have, but in general it is important to update means and inadvisable to update variances. Only testing on held out test data can decide which is best, but if you are training on data from many speakers and then adapting to data from just one speaker, I expect updating just means should give best results, with variance adaptation reducing performance and transition probs or mix weights adaptation making little difference.

After few experiments I can only confirm this statement. You should never adapt the variances. So, the HOWTO in our wiki is not so good as it could be. Another bit could be taken from this document, actually it's really better to combine MAP and MLLR this way and the best method for offline adaptation is:
  • Run bw to collect statistics
  • Estimate mllr transform
  • Update means with mllr
  • Run bw again with updated means
  • Apply MAP adaptation with fixed tau greater than 100 (try to select the best value). Unfortunately from my experience automatic tau selection is broken in map_adapt. This way you'll update the variances a bit, but only slightly.
No book could tell you that!


Post a Comment