Lab 2: HMM's and You
Prev		Next

3. Part 2: Implement most of the Forward-Backward algorithm

3.1. The Plan

For this part and the next part of the lab, we'll be making you implement the Forward-Backward algorithm for HMM/GMM training, but we'll have you do it in stages, to hopefully make it easier.

That is, in order to do meaningful decoding as in Part 1, we need to have trained HMM and GMM parameters. In particular, we need to select the following parameters: HMM arc probabilities, and Gaussian means and variances. To be specific, our decoding vocabulary consists of twelve words (ONE to NINE, ZERO, OH, and silence) consisting of a total of 34 phonemes. We use 3 HMM states for each phoneme (not including final states), giving us 102 states total. Each of these states has two outgoing arcs that we need to assign probabilities to, and a Gaussian that we need to estimate.

To see the transition probabilities and Gaussian parameters used in Part 1, look at the files p1.tprobs and p1.oprobs in ~stanchen/e6884/lab2/. The file p1.tprobs holds the outgoing transition probabilities for each of the 102 states, the self-loop probability followed by the exit probability. The mapping from words to state indices is arbitrary, e.g., the word EIGHT is assigned states 0 through 5, and the silence word is assigned states 99 to 101. (See p. 156 of the Holmes to see how to interpret transition probabilities.) In the file p1.oprobs, look for the <gaussians> line. After that line, alternating means and variances are listed for the Gaussian associated with each state's outgoing arcs.

Now, we can use the Forward-Backward algorithm to estimate these parameters. For example, we can initialize all parameters arbitrarily (e.g., 0.5 transition probabilities, 0 means, 1 variances). Then, we can iterate over our training data, reestimating our parameters after each iteration such that the likelihood our model assigns to the training data is guaranteed to increase (or at least not decrease) over the last iteration.

What we do in each iteration is the following:

Initialize all counts to 0.
Loop through each utterance in the training data:
- Use a front end to produce feature vectors.
- Create an HMM by splicing together the word HMM's for each word in the reference transcript for the utterance.
- Run Forward-Backward on this HMM and the acoustic feature vectors to update the counts.
Use the counts to reestimate our parameters.

To get more details on exactly what counts to take or why this algorithm does something reasonable, you'll have to do the readings for once in your life.

Now, the Forward-Backward algorithm can be decomposed into two parts: computing the posterior probabilities of each arc a at each frame t, usually denoted something like gamma(a,t); and using these gamma(a,t) values to update the counts you need to update. In Part 2 of the lab, we'll have you do the gamma(a,t) computation, and we'll supply the code that uses the gamma(a,t) values to update transition counts and probabilities. (Updating transition probabilities is quite simple; just sum and normalize.) In Part 3 of the lab, we'll have you write the code for updating Gaussian counts, and the code to use these counts to update Gaussian means and variances.

3.2. What You're Supposed to Do

In this part, you will be implementing the forward and backward passes of the Forward-Backward algorithm to calculate the posterior probability of each transition at each frame, but not the statistics collection and reestimation parts of the FB algorithm. Your job in this part is to fill in the function ForwardBackwardLab2G in Lab2_DP.C.

For this lab, you will only need to calculate the posterior counts of arcs and not states, since both transition probabilities and observations are located on arcs in this lab. Again, this convention agrees with the slides from Lecture 4, but not the readings, so be careful. More precisely, you need to calculate the values c_t(tr_ij|X) (which are another name for the gamma(a,t) values) (see slide 58, say, of lecture 4); the arguments frm and posteriorCount passed to graph.add_count() should be the values t and c_t(tr_ij|X), respectively. Note: the function graph.add_count() expects a regular probability, not a log probability!

More specifically, you need to write code that fills in the forward (log) probability forwLogProbM for each cell in the dynamic programming chart. Then, we provide code to compute the total (log) probability of the utterance, and code that initializes the backward (log) probability backLogProbM for each state for the last frame in the chart. Then, you need to fill in code that does the rest of the backward algorithm, and that computes the posterior counts. You must call the routine graph.add_count() with these posterior counts, which forwards these counts to do transition count updating. Hint: read section 9.12.2 (p. 154) of the Holmes in the assigned readings. (Pedantic note: posterior counts below 0.001 are not forwarded for efficiency's sake.)

Your code will be compiled into the training program TrainLab2. To compile this program with your code, type
smk TrainLab2
To run this trainer on some utterances representing isolated digits, run
lab2p2a.sh
This script just collect counts for training HMM transition probabilities over ten single digit utterances and outputs them to the file p2a.counts. That is, for each arc tr_ij, p2a.counts will contain the sum of c_t(tr_ij|X) over each frame t of each utterance X in the training data. The “correct” output can be found in the file p2a.counts in ~stanchen/e6884/lab2/. Again, it's OK if your output doesn't match exactly, but it should be very, very close. Notice that some of the counts are zero, because not all words in the vocabulary occur in this small training set.

To help with debugging, we have provided the file lab2p2a.debug which contains portions of the DP chart for the first utterance when running lab2p2a.sh.

Once you think you have this part working, run the script
lab2p2b.sh
This script reestimates transition probabilities on the data (while keeping observation probabilities unchanged), performing twenty iterations of the forward-backward algorithm and outputting the average logprob/frame (as computed in the forward algorithm) at each iteration. The training set is the same ten single digit utterances as before. If your implementation of the FB algorithm is correct, this logprob should always be (weakly) increasing and should look like it's converging. This script will output trained transition probabilities to the file p2b.tprobs.

Finally, decode some (isolated digit) test data with these trained transition probabilities (and the original untrained observation probabilities) by running the script
lab2p2c.sh

Prev	Home	Next
Part 1: Implement the Viterbi algorithm and Gaussian likelihood evaluation		Part 3: Implement Gaussian training within the Forward-Backward algorithm