Speech Recognition in GSoC Done Right

From year to year many end-user projecs are trying to push ASR with the help of Google and studens of the Summer Of Code program. If CMUSphinx team knows all about ASR, why should we stay away from that?

I had diverse experience with Google Summer Of Code before, but I still like this process and enjoy communication with new people. I think we have good chances to succeed here. So I started and filed an application proposal and the initial list of ideas


I will submit this proposal on March 8 after program start. We need more ideas now. As much as you can generate


We need to have more or less representative list. If you want to be a mentor, don't hestitate to write down your irc nick as well.

Noise reduction filtering in sphinx4

There is a huge gap between stock sphinx4 and real ASR system since critical parts like noise filtering, speaker diarization and postprocessing are missing. Not to mention the online adaptation. The default frontend is less then optimal for several reasons. For example it doesn't handle DC offset at all, it also uses energy-based endpointer in time domain, thus not so robust to additive noise.

As of today sphinx4 includes the implementation of Wiener filter that reduce noise and helps the voice activity detector as well. To try it checkout latest trunk and change the frontend pipeline as following:

<item>audioFileDataSource </item>
<item>dataBlocker </item>
<item>preemphasizer </item>
<item>windower </item>
<item>fft </item>
<item>wiener </item>
<item>speechClassifier </item>
<item>speechMarker </item>
<item>nonSpeechDataFilter </item>
<item>melFilterBank </item>
<item>dct </item>
<item>liveCMN </item>
<item>featureExtraction </item>

Then define wiener component:

<component name="wiener"
<property name="classifier" value="speechClassifier"/>

This frontend is stable to DC and also handles noise better. To try the noisy input, you could mix white noise with sox:

 sox 10001-90210-01803.wav noise.wav synth white
 sox noise.wav smallnoise.wav vol -45d
 sox -m 10001-90210-01803.wav smallnoise.wav 10001-90210-01803-noisy.wav

It would be nice to try with Aurora database as well.

This filter is very simple and has a number of disadvantages. For example it corrupts spectrum with harmonic noises sometimes and thus makes recognition even worse. But it definitely helps in presense of noise. Let's hope one day more sophisticated implementations like Ephraim-Malah filter, or even noise reduction with vector taylor series will be made available in default configurations.