Next: 1.6 Continuous Speech Recognition Up: 1 The Fundamentals of HTK Previous: 1.4 Baum-Welch Re-Estimation

1.5 Recognition and Viterbi Decoding

The previous section has described the basic ideas underlying HMM parameter re-estimation using the Baum-Welch algorithm. In passing, it was noted that the efficient recursive algorithm for computing the forward probability also yielded as a by-product the total likelihood . Thus, this algorithm could also be used to find the model which yields the maximum value of , and hence, it could be used for recognition.

In practice, however, it is preferable to base recognition on the maximum likelihood state sequence since this generalises easily to the continuous speech case whereas the use of the total probability does not. This likelihood is computed using essentially the same algorithm as the forward probability calculation except that the summation is replaced by a maximum operation. For a given model M, let represent the maximum likelihood of observing speech vectors to and being in state j at time t. This partial likelihood can be computed efficiently using the following recursion (cf. equation 1.16)

where

for 1<j<N. The maximum likelihood is then given by

As for the re-estimation case, the direct computation of likelihoods leads to underflow, hence, log likelihoods are used instead. The recursion of equation 1.27 then becomes

This recursion forms the basis of the so-called Viterbi algorithm. As shown in Fig. 1.6, this algorithm can be visualised as finding the best path through a matrix where the vertical dimension represents the states of the HMM and the horizantal dimension represents the frames of speech (i.e. time). Each large dot in the picture represents the log probability of observing that frame at that time and each arc between dots corresponds to a log transition probability. The log probability of any path is computed simply by summing the log transition probabilities and the log output probabilities along that path. The paths are grown from left-to-right column-by-column. At time t, each partial path is known for all states i, hence equation 1.31 can be used to compute thereby extending the partial paths by one time frame.

tex2html_wrap19738

This concept of a path is extremely important and it is generalised below to deal with the continuous speech case.

This completes the discussion of isolated word recognition using HMMs. There is no HTK tool which implements the above Viterbi algorithm directly. Instead, a tool called HVITE is provided which along with its supporting libraries, HNET and HREC, is designed to handle continuous speech. Since this recogniser is syntax directed, it can also perform isolated word recognition as a special case. This is discussed in more detail below.

Next: 1.6 Continuous Speech Recognition Up: 1 The Fundamentals of HTK Previous: 1.4 Baum-Welch Re-Estimation

ECRL HTK_V2.1: email [email protected]