For this part and the next part of the lab, we'll be making you implement the Forward-Backward algorithm for HMM/GMM training, but we'll have you do it in stages, to hopefully make it easier.
That is, in order to do meaningful decoding as in Part 1, we need to have trained HMM and GMM parameters. In particular, we need to select the following parameters: HMM arc probabilities, and Gaussian means and variances. To be specific, our decoding vocabulary consists of twelve words (ONE to NINE, ZERO, OH, and silence) consisting of a total of 34 phonemes. We use 3 HMM states for each phoneme (not including final states), giving us 102 states total. Each of these states has two outgoing arcs that we need to assign probabilities to, and a Gaussian that we need to estimate.
To see the transition probabilities and Gaussian parameters used in Part 1, look at the files p1.tprobs and p1.oprobs in ~stanchen/e6884/lab2/. The file p1.tprobs holds the outgoing transition probabilities for each of the 102 states, the self-loop probability followed by the exit probability. The mapping from words to state indices is arbitrary, e.g., the word EIGHT is assigned states 0 through 5, and the silence word is assigned states 99 to 101. (See p. 156 of the Holmes to see how to interpret transition probabilities.) In the file p1.oprobs, look for the <gaussians> line. After that line, alternating means and variances are listed for the Gaussian associated with each state's outgoing arcs.
Now, we can use the Forward-Backward algorithm to estimate these parameters. For example, we can initialize all parameters arbitrarily (e.g., 0.5 transition probabilities, 0 means, 1 variances). Then, we can iterate over our training data, reestimating our parameters after each iteration such that the likelihood our model assigns to the training data is guaranteed to increase (or at least not decrease) over the last iteration.
What we do in each iteration is the following:
Initialize all counts to 0.
Loop through each utterance in the training data:
Use a front end to produce feature vectors.
Create an HMM by splicing together the word HMM's for each word in the reference transcript for the utterance.
Run Forward-Backward on this HMM and the acoustic feature vectors to update the counts.
Use the counts to reestimate our parameters.
Now, the Forward-Backward algorithm can be decomposed into two parts: computing the posterior probabilities of each arc a at each frame t, usually denoted something like gamma(a,t); and using these gamma(a,t) values to update the counts you need to update. In Part 2 of the lab, we'll have you do the gamma(a,t) computation, and we'll supply the code that uses the gamma(a,t) values to update transition counts and probabilities. (Updating transition probabilities is quite simple; just sum and normalize.) In Part 3 of the lab, we'll have you write the code for updating Gaussian counts, and the code to use these counts to update Gaussian means and variances.
In this part, you will be implementing the forward and
backward passes of the Forward-Backward algorithm to calculate
the posterior probability of each transition at
each frame, but not the statistics collection and
reestimation parts of the FB algorithm.
Your job in this part is to fill in the function
ForwardBackwardLab2G
in Lab2_DP.C.
For this lab, you will only need to calculate the posterior counts of arcs and not states, since both transition probabilities and observations are located on arcs in this lab. Again, this convention agrees with the slides from Lecture 4, but not the readings, so be careful. More precisely, you need to calculate the values ct(trij|X) (which are another name for the gamma(a,t) values) (see slide 58, say, of lecture 4); the arguments frm and posteriorCount passed to graph.add_count() should be the values t and ct(trij|X), respectively. Note: the function graph.add_count() expects a regular probability, not a log probability!
More specifically, you need to write code that fills in the forward (log) probability forwLogProbM for each cell in the dynamic programming chart. Then, we provide code to compute the total (log) probability of the utterance, and code that initializes the backward (log) probability backLogProbM for each state for the last frame in the chart. Then, you need to fill in code that does the rest of the backward algorithm, and that computes the posterior counts. You must call the routine graph.add_count() with these posterior counts, which forwards these counts to do transition count updating. Hint: read section 9.12.2 (p. 154) of the Holmes in the assigned readings. (Pedantic note: posterior counts below 0.001 are not forwarded for efficiency's sake.)
Your code will be compiled into the training program TrainLab2. To compile this program with your code, type
smk TrainLab2 |
lab2p2a.sh |
To help with debugging, we have provided the file lab2p2a.debug which contains portions of the DP chart for the first utterance when running lab2p2a.sh.
Once you think you have this part working, run the script
lab2p2b.sh |
Finally, decode some (isolated digit) test data with these trained transition probabilities (and the original untrained observation probabilities) by running the script
lab2p2c.sh |