Next: 9.3 Parameter Tying and Item Lists Up: 9 HMM System Refinement Previous: 9.1 Using HHED

9.2 Constructing Context-Dependent Models

The first stage of model refinement is usually to convert a set of initialised and trained context-independent monophone HMMs to a set of context dependent models . As explained in section 6.4, HTK uses the convention that a HMM name of the form l-p+r denotes the context-dependent version of the phone p which is to be used when the left neighbour is the phone l and the right neighbour is the phone r. To make a set of context dependent phone models, it is only necessary to construct a HMM list, called say cdlist, containing the required context-dependent models and then execute HHED with a single command in its edit script

    CL cdlist

The effect of this command is that for each model l-p+r in cdlist it makes a copy of the monophone p.

The set of context-dependent models output by the above must be reestimated using HEREST. To do this, the training data transcriptions must be converted to use context-dependent labels and the original monophone hmm list must be replaced by cdlist. In fact, it is best to do this conversion before cloning the monophones because if the HLED TC command is used then the -n option can be used to generate the required list of context dependent HMMs automatically.

Before building a set of context-dependent models, it is necessary to decide whether or not cross-word triphones are to be used. If they are, then word boundaries in the training data can be ignored and all monophone labels can be converted to triphones. If, however, word internal triphones are to be used, then word boundaries in the training transcriptions must be marked in some way (either by an explicit marker which is subsequently deleted or by using a short pause tee-model). This word boundary marker is then identified to HLED using the WB command to make the TC command use biphones rather than triphones at word boundaries (see section 6.4).

All HTK tools can read and write HMM definitions in text or binary form. Text is good for seeing exactly what the tools are producing, but binary is much faster to load and store, and much more compact. Binary output is enabled either using the standard option -B or by setting the configuration variable SAVEBINARY . In the above example, the HMM set input to HHED will contain a small set of monophones whereas the output will be a large set of triphones. In order, to save storage and computation, this is usually a good point to switch to binary storage of MMFs.

Next: 9.3 Parameter Tying and Item Lists Up: 9 HMM System Refinement Previous: 9.1 Using HHED

ECRL HTK_V2.1: email [email protected]