5. Part 4: Train a model from scratch, and evaluate it on various digit test sets

In this section, we run our trainer/decoder on some larger data sets and look at continuous digit data (consisting of multiple connected digits per utterance) in addition to isolated digits.

First, let us see how our HMM/GMM system compares to the DTW system we developed in Lab 1 on isolated digits. We created a test set consisting of 11 isolated digits from each of 56 test speakers, and ran DTW using a single template for each digit from a pool of 56 training speakers (using a different training speaker for each test speaker). This yielded an error rate of 18.8%.

Run the following script:
This first trains a model on 100 isolated digit utterances (with five iterations of FB), and then decodes the same test set as above; then, trains a model on 300 utterances and decodes; then, trains a model on 1000 utterances and decodes. See how the word-error rate varies according to training set size. The trained models are saved in various files beginning with the prefix lab2p4a.

Next, run the following script:
This takes the 300-utterance model output by lab2p4a.sh and decodes connected digit string data (rather than isolated digits) with this model. It also trains a model on 300 connected digit sequences and decodes the same test set.