Blizzard 2009 results available

It was pleasant to find out that results of the Blizzard Challenge 2009 are now available. Thanks a lot ot organizers and participants!

Reading the articles took me half of the day trying to solve usual Einstein-type puzzle of figuring out who give the best results there and what was changed. Unfortunately it takes to much time to read everything in details. There is no summary on methods/systems used this year, the archivements from the last year and explanations of the results provided. I could only start with the following:

  1. iFlytek Speech Lab and IVO Software are still the best. Unit selection systems win.
  2. DFKI which I was fan of can't unfortunately jump to a commercial level even with unit selectoin. That probably means that not only unit selection is a key issue.
  3. I like the progress muXac and Mike are doing over years.
  4. ES3 task with building voice from small amount of speech is kind of senseless. Don't we want to use voice adaptation in this case 
  5. Interesting that machine learning for join and target cost optimization is popular nowdays
  6. Though there was telephone TTS task it seems for me that nobody did anything related to the TTS over the telphone lines. The differences shouldn't be large, only 8kHz is the issue or even the advantage, but even this moment is not covered in any articles or at least I didn't notice it.
Short summary on systems:
  • Aholab - unit selection, spent one day on building the voice so nothing good to expect
  • WISTON - Mandarin prosody is a key feature, but article doesn't describe challenge
  • Cereproc - experiment with combining HTS and unit selection, bad results or unknown reason, 4 man-days spent
  • CMU - article is not available, but you can try clustergen yourself in stock festival
  • CSTR - CSTR has started investigations on HTS methods. Good start, no results yet.
  • DFKI - spent year on adding Turkish TTS and Mary 4.0 implementation
  • Edinburgh/Idiap - interesting unsupervised entry, results are obvioulsy lower
  • I2R - good TTS, unit selection
  • Ivona - unit selection with pitch modifications by interestingly named algorithm, best English one together with iFlytek
  • CircumReality - unit selection with pitch modification by TD-PSOLA, best progress over years
  • NICT - HTS, GV, MGE and a lot of math
  • NIT - HTS with STRAIGHT, best HTS here, best Mandarin as well
  • NTUT - Mandarin HTS, not so interesting
  • PKU - Another Mandarin HTS with STRAGHT
  • Toshiba - Good unit selection system, interesting method about fuzzy combining units.
  • iFlytek - HMM-driven unit selection, best English one together with Ivona.
  • VUB - unit selection with WPSOLA, average, though interesting link on SPRAAK open source recognition toolkit, which is not completely open but has interesting description.
Still, the challenge itself is very interesting and I'm looking forward on the next challenge results.

Another cool bit if hardware for database training.

It's sometimes hard to adopt quickly the new opportunities world provide. I'm being reading now Innovator's Dilemma by Clayton M. Christensen. Thanks to Ellias for the advice, it really seems like a good book.

The interesting thing is that author starts with a description of hard drive industry as the fastest one with innovations going faster than customer needs. And, what do you think? Hard drive industry strikes back with SSD drives. Well, I read they exist but didn't understand their value for acoustic model training. Even without profiling it's clear they will be extremely useful. 

Say you have a medium size acoustic database of 60 hours of few gigabytes size. If you want to process it fast you need to use 8-core machine. Here comes the bottleneck, imagine 8 processes reading the feature vectors from a disk in an almost random way. No need to guess hard drive will be very busy trying to fetch all data required. SSD could definitely help here, I really need to try it soon.