Three Generation of IVR Systems

Recently I invented new nice concept for marketing people. Basicallly there are three generations of IVR systems right now:
  • Generation 1.0 - Static systems based on VoiceXML. It was suprising for me they are in wide use now and a lot of products are dedicated to their optimization/develoment. There are IDE's and a lot of testing tools, recommendations how to build proper VoiceXML. Come on, it's impossible to do that. It's something like static HTML websites that were popular in 1995. I don't believe any changes like javascript inside in VXML 3.0 will stop it slow death.
  • Generation 2.0 - Dynamic systems like Tropo from Voxeo. Much easier, much better. More control over content, more integration with the business logic. I really believe it's next generation because it gives developer much more control over the dialog. At least with the power of real scripting language like Python you'll be able to implement something non trivial with just several lines of code. That's AJAX or ROR in speech world.
  • Generation 3.0 - Semantic based IVR. This consists of three components - large vocabulary recognizer, semantic recognizer on top of it and even-based actions on top of it. Probably also an emotion recognition and more intelligent dialog tracking. As I see the developer has to define the structure of the dialog and provide handlers. Such system was described and developed  in CMU long time ago already and also it's described in all ASR textbooks. But I'm not aware of any widely known platform allowing to do this kind of IVR. Once again it shows how big the gap is between the academia and software developers.
If you are planning to create IVR application with CMUSphinx, please, consider IVR generation 3 as your base technology ;) And don't forget to share the code.


Very much on the same topic from a wonderful Nu Echo blog:


Post a Comment