4. Part 3: Add Support for Word Traceback and Skip Arcs to a Large-Vocabulary Decoder

In this part of the lab, we provide a nearly complete large vocabulary decoder, and you will need to add support for recording word traceback information as well as support for handling skip arcs, i.e., arcs with no output. However, we have done most of the work; these changes should involve adding/modifying only a few lines of code.

Now, we outline the basic functioning of the code. Recall the Viterbi algorithm we implemented for Lab 2:

C[0, start].vProb = 1
for t in [0 ... (T-1)]:
  for s_src in [1 ... S]:
    for a in outArcs(s_src):
      s_dst = dest(a)
      curProb = C[t, s_src].vProb * arcProb(a, t)
      if curProb > C[t+1, s_dst].vProb:
        C[t+1, s_dst].vProb = curProb
        C[t+1, s_dst].trace = a
Now, look at the main function ViterbiLab4G in Lab4_DP.C, which executes the Viterbi algorithm for a single utterance. As mentioned in [Lecture 8], it is not generally feasible to store the whole dynamic programming chart C[t, s] in large-vocabulary decoding. Instead, we store only the active states for the last and current frames. For each frame, we store the active states in a FrameData structure. We allocate two of these structures, and initially set the variable lastFrame to point to one and nextFrame to point to the other. For each iteration in our outer loop over frames:
for (int frm = 0; frm <= numFrames; ++frm)
the variable lastFrame holds the active states for frame frm and nextFrame holds the active states that we are creating for frame frm+1.[1] At the end of the iteration, we swap lastFrame and nextFrame, and at the beginning of the next iteration, we clear nextFrame, thus setting us up correctly for the next iteration.

Now, in Lab 2, we iterated over all states, but for LVCSR we can only iterate over active states. In addition, we must iterate over active states in topological order (relative to skip arcs) in order for the dynamic programming calculation to be correct, as mentioned in class. We handle this for you, using a data structure known as a [heap]. A heap holds a list of elements such that the top of the heap always holds the first element according to some ordering, in our case, the topological ordering. In the code, the loop over active states is implemented by first calling lastFrame->init_heap() to initialize the heap for active states in the last frame, and then doing
while ((curState = lastFrame->pop_heap()))
to pop states off the top of the heap until the heap is empty. In our decoder, we assume that the static decoding graph has already been topologically sorted relative to skip arcs, so topological state ordering will be the same as numeric order.

To create/lookup a DP cell for a state, we use the method insert_cell(). That is, we no longer preallocate all cells C[t, s] in the DP chart for a frame t, and instead create cells on the fly as new active states are created. For example, see the line
ChartCell& dstCell = nextFrame->insert_cell(dstState);
If you don't understand the “&” notation, you can do the following instead:
ChartCell* dstCellPtr = &nextFrame->insert_cell(dstState);
If a cell for state dstState already exists in nextFrame, this cell is returned; otherwise, a new cell is created (and its logprob is initialized to zeroLogProbS). Then, this covers the main differences between the Viterbi implementations for Lab 2 and Lab 4.

Figure 1. A Backtrace Word Tree

Now, the first task for Part 3 is to add backtrace updating; the code as it stands only calculates the Viterbi probability but not the best word sequence. Unlike in previous labs, we do not provide BEGIN_LAB and END_LAB markers; you need to figure out where to make the changes by yourself. Backtrace information is stored in the wordTreeNodeM field of the ChartCell structure. That is, in contrast to [Lab 2] where we store the best incoming arc for each cell in the chart, we now store the word sequence labeling the best incoming path for each cell in the dynamic programming chart. As described in class, we encode the best word sequence as a single integer index that specifies a node in what we call a word tree. An example word tree is depicted in Figure 1. The word sequence associated with a node is the sequence of words labeling the path from the root node to that node. For example, with the word graph in Figure 1, if a cell has a wordTreeNodeM value of 5, this signals that the best path to this cell is labeled with the word sequence THE DOG ATE.

For this part of the lab, you will need to set the wordTreeNodeM field for all cells iterated over and to update the word tree as you go, creating new nodes as needed. The word tree is stored in a variable wordTree of type WordTree. (Note: In the code, the root has index 0 rather than 1 as in Figure 1.) Initially, the word tree will consist of a single node, the root node. To complete this task, you will need to access the following method of WordTree: wordTree.insert_node(prevNode, nextWord). This returns the index of the node you arrive at if you extend the node prevNode with the word nextWord (represented as an integer index), creating the node if it doesn't already exist. (Recall from Lab 3 that we can represent word identities as integer indices.) For example, wordTree.insert_node(5, nextWord) would return the value 8 in the above example if nextWord corresponds to the word MY. To find the word label residing on an arc arcPtr, call the method arcPtr->get_label(). This returns the integer index of the word residing on the arc, or 0 if there is no word label. Note: most arcs do not have word labels on them.

To complete this task, you should need only write a few lines of code. Hint: if the best path into dstCell comes from curCell and the arc between them has no word label, how does the wordTreeNodeM value of dstCell compare to that of curCell? What if the arc between them does have a word label?

Your code will be compiled into the program DcdLab4, which is almost identical to the program DcdLab2 from [Lab 2], except that it links in the Lab 4 Viterbi implementation instead of the Lab 2 Viterbi implementation. To compile this program with your code, type
smk DcdLab4
To run this program on a small test set of Wall Street Journal utterances using the models described in Section 3.1, run
lab4p3a.sh
In this part, for speed's sake we are not using our big static decoding graph. Instead, we create an HMM consisting of all “word” sequences consistent with the reference transcript as described in slide 45 in [Lecture 7]. This HMM can be used to produce an “extended” transcript for an utterance, one that includes silences and pronunciation variant information. In this scenario, the decoded output will always contain the same “words” as the reference transcript.

The target output can be found in the file p3a.out in ~stanchen/e6884/lab4/; you should match the decoded word sequence printed for each utterance. As usual, the -debug flag can be used to run the script in the debugger. The instructions in lab4.txt will ask you to run the script lab4p3b.sh, which does the same thing as lab4p3a.sh except on a different test set.

For the second task of this part, you need to add support for skip arcs, i.e., arcs with no output. That is, the code as is will not work correctly if skip arcs are present, and you need to find the fix that will make the code work correctly. Note that in the loop over arcs in the code, the variable hasOutput will be false for skip arcs. For this part, you need only change a single line of code; the code is designed so that the obvious change will result in the correct behavior. Again, you need to recompile DcdLab4 to test your change:
smk DcdLab4
To run this program on a small sample test set, run
lab4p3c.sh
This uses exactly the same models and test set as lab4p3a.sh, except that we modified the graph expansion process to introduce unnecessary skip arcs in some places. Target output can be found in the file p3c.out in ~stanchen/e6884/lab4/. You should try to match the logprob for each utterance exactly. The instructions in lab4.txt will ask you to run the script lab4p3d.sh, which does the same thing as lab4p3c.sh except on a different test set.

Notes

[1]

You may notice that the loop over frames goes to numFrames inclusive, which seems one frame too far. We iterate over this last frame to update skip arcs only, and skip over emitting arcs for this frame.