PLP is going to be default soon

It looks like MFCC features are going to become a history. Everyone is using 9 combined PLP frames + later LDA projection to 40-50 values. Few examples including Google in it's audio indexing system, IBM and BBN see system description in results, OGI/ICSI and many others.

The issue right now is that sphinx4 PLP implemetation seems to be broken, it produces kind of garbage features which doesn't give enough accuracy after training. Luckily there is HTK. Once this issue will get fixes, I think I'll retrain PLP + MLLT model for Voxforge. Unfortunately I don't have any definite plan for implementation of PLP in sphinxbase.

No comments:

Post a Comment