Next: 9.4 Data-Driven Clustering Up: 9 HMM System Refinement Previous: 9.2 Constructing Context-Dependent Models

9.3 Parameter Tying and Item Lists

As explained in Chapter 7, HTK uses macros to support a generalised parameter tying facility. Referring again to Fig. 7.7.8, each of the solid black circles denotes a potential tie-point in the hierarchy of HMM parameters. When two or more parameter sets are tied, the same set of parameter values are shared by all the owners of the tied set. Externally, tied parameters are represented by macros and internally they are represented by structure sharing. The accumulators needed for the numerators and denominators of the Baum-Welch re-estimation formulae given in section 8.7 are attached directly to the parameters themselves. Hence, when the values of a tied parameter set are re-estimated, all of the data which would have been used to estimate each individual untied parameter are effectively pooled leading to more robust parameter estimation.

Note also that although parameter tying is implemented in a way which makes it transparent to the HTK re-estimation and recognition tools. In practice, these tools do notice when a system has been tied and try to take advantage of it by avoiding redundant computations.

Although macro definitions could be written by hand, in practice, tying is performed by executing HHED commands and the resulting macros are thus generated automatically. The basic HHED command for tying a set of parameters is the TI command which has the form

   TI macroname itemlist

This causes all items in the given itemlist to be tied together and output as a macro called macroname. Macro names are written as a string of characters optionally enclosed in double quotes. The latter are necessary if the name contains one or more characters which are not letters or digits.

tex2html_wrap21976

Item lists use a simple language to identify sets of points in the HMM parameter hierarchy illustrated in Fig. 7.7.8. This language is defined fully in the reference entry for HHED. The essential idea is that item lists represent paths down the hierarchical parameter tree where the direction down should be regarded as travelling from the root of the tree to towards the leaves. A path can be unique, or more usually, it can be a pattern representing a set of paths down the tree. The point at which each path stops identifies one member of the set represented by the item list. Fig. 9.1 shows the possible paths down the tree. In text form the branches are replaced by dots and the underlined node names are possible terminating points. At the topmost level, an item list is a comma separated list of paths enclosed in braces.

Some examples, should make all this clearer. Firstly, the following is a legal but somewhat long-winded way of specifying the set of items comprising states 2, 3 and 4 of the HMM called aa

     { aa.state[2],aa.state[3],aa.state[4] }

however in practice this would be written much more compactly as

     { aa.state[2-4] }

It must be emphasised that indices in item lists are really patterns. The set represented by an item list consists of all those elements which match the patterns. Thus, if aa only had two emitting states, the above item list would not generate an error. It would simply only match two items. The reason for this is that the same pattern can be applied to many different objects. For example, the HMM name can be replaced by a list of names enclosed in brackets, furthermore each HMM name can include `?' characters which match any single character and `*' characters which match zero or more characters. Thus

     { (aa+*,iy+*,eh+*).state[2-4] }

represents states 2, 3 and 4 of all biphone models corresponding to the phonemes aa, iy and eh. If aa had just 2 emitting states and the others had 4 emitting states, then this item list would include 2 states from each of the aa models and 3 states from each of the others. Moving further down the tree, the item list

     { *.state[2-4].stream[1].mix[1,3].cov }

denotes the set of all covariance vectors (or matrices) of the first and third mixture components of stream 1, of states 2 to 4 of all HMMs. Since many HMM systems are single stream, the stream part of the path can be omitted if its value is 1. Thus, the above could have been written

     { *.state[2-4].mix[1,3].cov }

These last two examples also show that indices can be written as comma separated lists as well as ranges, for example, [1,3,4-6,9] is a valid index list representing states 1, 3, 4, 5, 6, and 9.

When item lists are used as the argument to a TI command , the kind of items represented by the list determines the macro type in a fairly obvious way. The only non-obvious cases are firstly that lists ending in cov generate v, i, c, or x macros as appropriate. If an explicit set of mixture components is defined as in

     { *.state[2].mix[1-5] }

then

m macros are generated but omitting the indices altogether denotes a special case of mixture tying which is explained later in Chapter 10.

To illustrate the use of item lists, some example TI commands can now be given. Firstly, when a set of context-dependent models is created, it can be beneficial to share one transition matrix across all variants of a phone rather than having a distinct transition matrix for each. This could be achieved by adding TI commands immediately after the CL command described in the previous section, that is

    CL cdlist
    TI T_ah {*-ah+*.transP}
    TI T_eh {*-eh+*.transP}
    TI T_ae {*-ae+*.transP}
    TI T_ih {*-ih+*.transP}
     ... etc

As a second example, a so-called Grand Variance HMM system can be generated very easily with the following HHEd command

     TI "gvar" { *.state[2-4].mix[1].cov }

where it is assumed that the HMMs are 3-state single mixture component models. The effect of this command is to tie all state distributions to a single global variance vector. For applications, where there is limited training data, this technique can improve performance, particularly in noise.

Speech recognition systems will often have distinct models for silence and short pauses. A silence model sil may have the normal 3 state topology whereas a short pause model may have just a single state. To avoid the two models competing with each other, the sp model state can be tied to the centre state of the sil model thus

     TI "silst" { sp.state[2], sil.state[3] }

So far nothing has been said about how the parameters are actually determined when a set of items is replaced by a single shared representative. When states are tied, the state with the broadest variances and as few as possible zero mixture component weights is selected from the pool and used as the representative. When mean vectors are tied, the average of all the mean vectors in the pool is used and when variances are tied, the largest variance in the the pool is used. In all other cases, the last item in the tie-list is arbitrarily chosen as representative. All of these selection criteria are ad hoc, but since the tie operations are always followed by explicit re-estimation using HEREST, the precise choice of representative for a tied set is not critical.

Finally, tied parameters can be untied. For example, subsequent refinements of the context-dependent model set generated above with tied transition matrices might result in a much more compact set of models for which individual transition parameters could be robustly estimated. This can be done using the UT command whose effect is to untie all of the items in its argument list. For example, the command

     UT {*-iy+*.transP}

would untie the transition parameters in all variants of the iy phoneme. This untying works by simply making unique copies of the tied parameters. These untied parameters can then subsequently be re-estimated.

Next: 9.4 Data-Driven Clustering Up: 9 HMM System Refinement Previous: 9.2 Constructing Context-Dependent Models

ECRL HTK_V2.1: email [email protected]