Blizzard 2009 results available

It was pleasant to find out that results of the Blizzard Challenge 2009 are now available. Thanks a lot ot organizers and participants!

Reading the articles took me half of the day trying to solve usual Einstein-type puzzle of figuring out who give the best results there and what was changed. Unfortunately it takes to much time to read everything in details. There is no summary on methods/systems used this year, the archivements from the last year and explanations of the results provided. I could only start with the following:

  1. iFlytek Speech Lab and IVO Software are still the best. Unit selection systems win.
  2. DFKI which I was fan of can't unfortunately jump to a commercial level even with unit selectoin. That probably means that not only unit selection is a key issue.
  3. I like the progress muXac and Mike are doing over years.
  4. ES3 task with building voice from small amount of speech is kind of senseless. Don't we want to use voice adaptation in this case 
  5. Interesting that machine learning for join and target cost optimization is popular nowdays
  6. Though there was telephone TTS task it seems for me that nobody did anything related to the TTS over the telphone lines. The differences shouldn't be large, only 8kHz is the issue or even the advantage, but even this moment is not covered in any articles or at least I didn't notice it.
Short summary on systems:
  • Aholab - unit selection, spent one day on building the voice so nothing good to expect
  • WISTON - Mandarin prosody is a key feature, but article doesn't describe challenge
  • Cereproc - experiment with combining HTS and unit selection, bad results or unknown reason, 4 man-days spent
  • CMU - article is not available, but you can try clustergen yourself in stock festival
  • CSTR - CSTR has started investigations on HTS methods. Good start, no results yet.
  • DFKI - spent year on adding Turkish TTS and Mary 4.0 implementation
  • Edinburgh/Idiap - interesting unsupervised entry, results are obvioulsy lower
  • I2R - good TTS, unit selection
  • Ivona - unit selection with pitch modifications by interestingly named algorithm, best English one together with iFlytek
  • CircumReality - unit selection with pitch modification by TD-PSOLA, best progress over years
  • NICT - HTS, GV, MGE and a lot of math
  • NIT - HTS with STRAIGHT, best HTS here, best Mandarin as well
  • NTUT - Mandarin HTS, not so interesting
  • PKU - Another Mandarin HTS with STRAGHT
  • Toshiba - Good unit selection system, interesting method about fuzzy combining units.
  • iFlytek - HMM-driven unit selection, best English one together with Ivona.
  • VUB - unit selection with WPSOLA, average, though interesting link on SPRAAK open source recognition toolkit, which is not completely open but has interesting description.
Still, the challenge itself is very interesting and I'm looking forward on the next challenge results.

6 comments:

Post a Comment