6. Part 5: Evaluate the Performance of Various Models on WSJ Test Data

In this section, we will be doing runs using our big static decoding graph (yay!) on a small 10-utterance WSJ test set. Before doing this part, you should recompile your code with optimization on so things will run faster. Enter the following commands:
smk -all -O2 -DNDEBUG Lab4_DP.o
smk DcdLab4

First, let us look into the impact of various modeling decisions. Our baseline system contains 3k 8-component GMM's running on MFCC's with delta's and delta-delta's. Let's see what happens if we reduce the number of Gaussians per mixture; run the following scripts:
lab4p5.gmm8.sh
lab4p5.gmm4.sh
lab4p5.gmm2.sh
lab4p5.gmm1.sh
These scripts correspond to 8-component, 4-component, 2-component, and single Gaussian GMM's, respectively. (We saved the intermediate models as we did mixture splitting to go from 1 component to 8 components per GMM.) Remember to note the error rate of each run.

Now, let's see how much delta's and delta-delta's help. Run the following scripts:
lab4p5.mfccdd.sh
lab4p5.mfccd.sh
lab4p5.mfcc.sh
The first run has MFCC with delta's and delta-delta's, the second has MFCC with only delta's, and the last is only MFCC. (Instead of retraining each model from scratch, we truncated the extra dimensions from each Gaussian.) These and all following runs use 8-component GMM's.

Finally, let's see how pruning affects performance. Run the following scripts:
lab4p5.10.none.sh
lab4p5.5.none.sh
lab4p5.2.none.sh
lab4p5.none.10k.sh
lab4p5.none.5k.sh
lab4p5.none.2k.sh
lab4p5.10.10k.sh
lab4p5.5.5k.sh
The first value in each script name is the beam used in beam pruning; the second value is the rank threshold used in rank pruning. For this part, note both the real-time factor as well as the WER.