Noise reduction filtering in sphinx4

There is a huge gap between stock sphinx4 and real ASR system since critical parts like noise filtering, speaker diarization and postprocessing are missing. Not to mention the online adaptation. The default frontend is less then optimal for several reasons. For example it doesn't handle DC offset at all, it also uses energy-based endpointer in time domain, thus not so robust to additive noise.

As of today sphinx4 includes the implementation of Wiener filter that reduce noise and helps the voice activity detector as well. To try it checkout latest trunk and change the frontend pipeline as following:

<item>audioFileDataSource </item>
<item>dataBlocker </item>
<item>preemphasizer </item>
<item>windower </item>
<item>fft </item>
<item>wiener </item>
<item>speechClassifier </item>
<item>speechMarker </item>
<item>nonSpeechDataFilter </item>
<item>melFilterBank </item>
<item>dct </item>
<item>liveCMN </item>
<item>featureExtraction </item>

Then define wiener component:

<component name="wiener"
type="edu.cmu.sphinx.frontend.endpoint.WienerFilter">
<property name="classifier" value="speechClassifier"/>
</component>

This frontend is stable to DC and also handles noise better. To try the noisy input, you could mix white noise with sox:

 sox 10001-90210-01803.wav noise.wav synth white
 sox noise.wav smallnoise.wav vol -45d
 sox -m 10001-90210-01803.wav smallnoise.wav 10001-90210-01803-noisy.wav

It would be nice to try with Aurora database as well.

This filter is very simple and has a number of disadvantages. For example it corrupts spectrum with harmonic noises sometimes and thus makes recognition even worse. But it definitely helps in presense of noise. Let's hope one day more sophisticated implementations like Ephraim-Malah filter, or even noise reduction with vector taylor series will be made available in default configurations.

16 comments:

Post a Comment