Next: 10.4 Parameter Smoothing Up: 10 Discrete and Tied-Mixture Models Previous: 10.2 Using Discrete Models with Speech

10.3 Tied Mixture Systems

Discrete systems have the advantage of low run-time computation. However, vector quantisation reduces accuracy and this can lead to poor performance. As a intermediate between discrete and continuous, a fully tied-mixture system can be used. Tied-mixtures are conceptually just another example of the general parameter tying mechanism built-in to HTK. However, to use them effectively in speech recognition systems a number of storage and computational optimisations must be made. Hence, they are given special treatment in HTK.

When specific mixtures are tied as in

     TI "mix" {*.state[2].mix[1]}

then a Gaussian mixture component is shared across all of the owners of the tie. In this example, all models will share the same Gaussian for the first mixture component of state 2. However, if the mixture component index is missing, then all of the mixture components participating in the tie are joined rather than tied. More specifically, the commands

     JO 128 2.0
     TI "mix" {*.state[2-4].mix}

has the following effect. All of the mixture components in states 2 to 4 of all models are collected into a pool. If the number of components in the pool exceeds 128, as set by the preceding join command JO , then components with the smallest weights are removed until the pool size is exactly 128. Similarly, if the size of the initial pool is less than 128, then mixture components are split using the same algorithm as for the Mix-Up MU command. All states then share all of the mixture components in this pool. The new mixture weights are chosen to be proportional to the log probability of the corresponding new mixture component mean with respect to the original distribution for that state. The log is used here to give a wider spread of mixture weights. All mixture weights are floored to the value of the second argument of the JO command times MINMIX .

The net effect of the above two commands is to create a set of tied-mixture HMMs where the same set of mixture components is shared across all states of all models. However, the type of the HMM set so created will still be SHARED and the internal representation will be the same as for any other set of parameter tyings. To obtain the optimised representation of the tied-mixture weights described in section 7.5, the following HHED HK command must be issued

     HK TIEDHS

This will convert the internal representation to the special tied-mixture form in which all of the tied mixtures are stored in a global table and referenced implicitly instead of being referenced explicitly using pointers.

Tied-mixture HMMs work best if the information relating to different sources such as delta coefficients and energy are separated into distinct data streams. This can be done by setting up multiple data stream HMMs from the outset. However, it is simpler to use the SS command in HHED to split the data streams of the currently loaded HMM set. Thus, for example, the command

     SS 4

would convert the currently loaded HMMs to use four separate data streams rather than one. When used in the construction of tied-mixture HMMs this is analogous to the use of multiple codebooks in discrete density HMMs.

The procedure for building a set of tied-mixture HMMs may be summarised as follows

Choose a codebook size for each data stream and then decide how many Gaussian components will be needed from an initial set of monophones to approximately fill this codebook. For example, suppose that there are 48 three state monophones. If codebook sizes of 128 are chosen for streams 1 and 2, and a codebook size of 64 is chosen for stream 3 then single Gaussian monophones would provide enough mixtures in total to fill the codebooks.
Train the initial set of monophones.

Use HHED to first split the HMMs into the required number of data streams, tie each individual stream and then convert the tied-mixture HMM set to have the kind TIEDHS. A typical script to do this for four streams would be

    SS 4
    JO 256 2.0
    TI st1 {*.state[2-4].stream[1].mix}
    JO 128 2.0
    TI st2 {*.state[2-4].stream[2].mix}
    JO 128 2.0
    TI st3 {*.state[2-4].stream[3].mix}
    JO 64 2.0
    TI st4 {*.state[2-4].stream[4].mix}
    HK TIEDHS

Re-estimate the models using HEREST in the normal way.

Once the set of retrained tied-mixture models has been produced, context dependent models can be constructed using similar methods to those outlined previously.

When evaluating probabilities in tied-mixture systems, it is often sufficient to sum just the most likely mixture components since for any particular input vector, its probability with respect to many of the Gaussian components will be very low. HTK tools recognise TIEDHS HMM sets as being special in the sense that additional optimisations are possible. When full tied-mixtures are used, then an additional layer of pruning is applied. At each time frame, the log probability of the current observation is computed for each mixture component. Then only those components which lie within a threshold of the most likely component are retained. This pruning is controlled by the -c option in HREST, HEREST and HVITE.

Next: 10.4 Parameter Smoothing Up: 10 Discrete and Tied-Mixture Models Previous: 10.2 Using Discrete Models with Speech

ECRL HTK_V2.1: email [email protected]