Using HTK models in sphinx4

As from yesterday long waited cool patch by Christophe Cerisara with the help of super fast Yaniv Kunda has landed in svn trunk. Now you can use HTK model directly from sphinx4. Though it's not easy since I spend a few hours today figuring the required issues, so here is a little step-by-step howto:

1. Update to sphinx4 trunk

2. Download small model, because currently binary loading is not supported unfortunately and it takes a lot of resources to load the model from a huge text file. Get a model from Keith Vertanen

http://www.inference.phy.cam.ac.uk/kv227/htk/htk_wsj_si84_2750_8.zip

3. Convert model to text format with HTK HHEd

mkdir out
touch empty
HHEd -H hmmdefs -H macros -M out empty tiedlist


4. Replace model in Lattice demo in configuration file:

<component name="wsj" type="edu.cmu.sphinx.linguist.acoustic.tiedstate.TiedStateAcousticModel">
<property name="loader" value="wsjLoader"/>
<property name="unitManager" value="unitManager"/>
</component>
<component name="wsjLoader" type="edu.cmu.sphinx.linguist.acoustic.tiedstate.HTKLoader">
<property name="logMath" value="logMath"/>
<property name="modelDefinition" value="/home/shmyrev/sphinx4/wsj/out/hmmdefs"/>
<property name="unitManager" value="unitManager"/>
</component>


Please note here that modelDefinition property points to the location of the newly created hmmdefas file.

5. Replace the frontend configuration to load HTK features from a file. Unfortunately it's impossible to create HTK features with sphinx4 frontend right now, but this will be implemented soon I hope. Some bits are already present like DCT-II transform with frontend.transform.DiscreteCosineTransform2, some are easy to setup like proper filter coefficients, some are missing. So for now we'll recognize MFC file instead.

<component name="epFrontEnd" type="edu.cmu.sphinx.frontend.FrontEnd">
<propertylist name="pipeline">
<item> streamHTKSource </item>
</propertylist>
</component>
<component name="streamHTKSource" type="edu.cmu.sphinx.frontend.util.StreamHTKCepstrum">
<property name="cepstrumLength" value="39"/>
</component>


and let's change the Java file

StreamHTKCepstrum source = (StreamHTKCepstrum) cm.lookup ("streamHTKSource");
InputStream stream = new FileInputStream(new File ("input.mfc"));
source.setInputStream(stream);


6. Now let's extract mfc. Create a config file for HCopy

SOURCEFORMAT = WAV
TARGETKIND = MFCC_D_A_Z_0
TARGETRATE = 100000.0
WINDOWSIZE = 250000.0
USEHAMMING = T
PREEMCOEF = 0.97
NUMCHANS = 26
CEPLIFTER = 22
NUMCEPS = 12
ENORMALISE = T
ZMEANSOURCE = T
USEPOWER = T


and run it

HCopy -C config 10001-90210-01803.wav input.mfc


make sure input.mfc is located in top sphinx4 folder now since this is the place we'll take it.

7. Now everything is ready

ant && java -jar bin/LatticeDemo.jar


Check the result

I heard: once or a zero zero one nine oh to one oh say or oil days or a jury


It's not very precise, but still ok for such a small model and limited language model.

This is still a work in progress and a lot of things still pending. The most important are reading the binary HTK files, frontend adaptation, cleanup and unification. But I really look forward on the results, since it's really a promising approach. There are not so many BSD-licensed HTK decoders out there.

8 comments:

dhd said...

Hey, the C front-end in sphinxbase can generate HTK features although they need to be rearranged a bit... The options corresponding to the HTK config above are:

-round_filters no
-unit_area no
-remove_dc yes
-transform htk
-lifter 22
-nfilt 26
-lowerf 1
-upperf 8000

Note, however, that liftering does absolutely nothing if you use CMN. Likewise, -unit_area makes no difference either in this case.

nshmyrev said...

Hi David, thanks. Yes, I saw your page, everything should be rather easy to setup. I've actually tried to reproduce the correct settings, but I'm not sure why it doesn't work for sphinx4 right now.

Rasika said...

Hi,
I'm working with a project which is having speech recognition for Sinhala Language. Since i'm having HTK acoustic model which is previously trained ( to use in Julius engine ) I'm trying to use that acoustic model to implement speech recognition in Sphinx4. There are two files hmmdefs and tiedlist as acoustic models. As you described in step4 i changed my MyApp.config.xml file to point the hmmdefs file. And step5 first change too. Since I'm new to Sphinx4 its bit hard to understand where to change Step5 Java file and Step6. Can you please explain a bit.

Thanks in advance !

Nickolay V. Shmyrev said...

Hello Rasika

> Since I'm new to Sphinx4 its bit hard to understand where to change Step5 Java file

You have java code which you run, don't you? It may be Transcriber demo for example. Transcriber demo sources are located in sphinx4/apps/edu/cmu/sphinx/demo/transcriber/Transcriber.java.

> and Step6.

What about step6?

Rasika said...

Hi Nickolay, Thanks a lot for your reply !
As I understood by further reading your post and the reply ;
We can point a new acoustic model (hmmdefs file) to be used in our application as Step4.
But right now HTK generated acoustic models can recognize .mfc inputs only,not live inputs like in HelloWorld example. Which you described in step 5, 6 and 7.

Have I get it right ?

Please recommend a guide/tutorial for acoustic model building and training for Sphinx4/PocketSphinx

Nickolay V. Shmyrev said...

> But right now HTK generated acoustic models can recognize .mfc inputs only,not live inputs like in HelloWorld example. Which you described in step 5, 6 and 7.

Not really, you can configure frontend pipeline to match HTK one. It's just not covered in the post. You need to modify MelFilterBank params and use DCT2 instead of DCT. However, it's not guaranteed you will have the same accuracy

> Please recommend a guide/tutorial for acoustic model building and training for Sphinx4/PocketSphinx

http://cmusphinx.sourceforge.net/wiki/tutorial

Sriram Shankar said...

I tried the above example, but I get the following output:

Getting config.xml
Loading...
Frames: 821
HTK loading...
HTK loading finished
HTK -> S4 conversion finished
17:38:10.356 WARNING dictionary Missing word:
17:38:11.090 WARNING dictionary Missing word:
The result of the recognition was null.

I have not done anything different from what has been indicated. I am not able to figure why I am getting the above warning. Any help is most welcome!

Nickolay V. Shmyrev said...

Hello Sriram

I've just repeated all the steps from fresh trunk and got successful output. I only found few issues with xml config above (screwed tags) which I'm going to fix right now. Maybe you did something wrong, just try it again. Here is the output I get:

Loading...
HTK loading...
HTK loading finished
HTK -> S4 conversion finished
03:55:11.899 WARNING dictionary Missing word: <sil>
03:55:13.173 WARNING dictionary Missing word: <sil>
Frames: 829
I heard: once or a zero zero one nine oh to one oh say or oil days or a jury

Post a Comment