next up previous contents index
Next: 3.3.2 Step 10 - Making Tied-State Triphones Up: 3.3 Creating Tied-State Triphones Previous: 3.3 Creating Tied-State Triphones

3.3.1 Step 9 - Making Triphones from Monophones

Context-dependent triphones can be made by simply cloning   monophones and then re-estimating using triphone transcriptions. The latter should be created first using HLED  because a side-effect is to generate a list of all the triphones for which there is at least one example in the training data. That is, executing

    HLEd -n triphones0 -l '*' -i lib/wintri.mlf mktri.led aligned.mlf
will convert the monophone transcriptions in aligned.mlf to an equivalent set of triphone transcriptions in wintri.mlf. At the same time, a list of triphones is written to the file triphones0. The edit script mktri.led contains the commands
    WB sp
    WB sil
    TC
The two WB  commands define sp and sil as word boundary symbols. These then block the addition of context in the TI command, seen in the following script, which converts all phones (except word boundary symbols) to triphones    . For example,
    sil th ih s sp m ae n sp ...
becomes
    sil th+ih th-ih+s ih-s sp m+ae m-ae+n ae-n sp ...
This style of triphone transcription is referred to as word internal.   Note that some biphones will also be generated as contexts at word boundaries will sometimes only include two phones.

The cloning of models can be done efficiently using the HMM editor HHED:

    HHEd -B -H hmm9/macros -H hmm9/hmmdefs -M hmm10 
         mktri.hed monophones1
where the edit script mktri.hed contains a clone command CL followed by TI commands to tie all of the transition matrices in each triphone  set, that is:
    CL triphones0
    TI T_ah {(*-ah+*,ah+*,*-ah).transP}
    TI T_ax {(*-ax+*,ax+*,*-ax).transP}
    TI T_ey {(*-ey+*,ey+*,*-ey).transP}
    TI T_b {(*-b+*,b+*,*-b).transP}
    TI T_ay {(*-ay+*,ay+*,*-ay).transP}
    ...
The file mktri.hed can be generated using the Perl script maketrihed included in the HTKTutorial directory.

The clone command CL  takes as its argument the name of the file containing the list of triphones (and biphones)    generated above. For each model of the form a-b+c in this list, it looks for the monophone b and makes a copy of it.  Each TI command takes as its argument the name of a macro and a list of HMM components. The latter uses a notation which attempts to mimic the hierarchical structure of the HMM parameter set in which the transition matrix transP can be regarded as a sub-component of each HMM. The list of items within brackets are patterns designed to match the set of triphones, right biphones and left biphones for each phone.

  tex2html_wrap19812

Up to now macros and tying have only been mentioned in passing. Although a full explanation must wait until chapter 7, a brief explanation is warranted here. Tying means that one or more HMMs share the same set of parameters. On the left side of Fig. 3.12, two HMM definitions are shown. Each HMM has its own individual transition matrix. On the right side, the effect of the first TI command in the edit script mktri.hed is shown. The individual transition matrices have been replaced by a reference to a macro called T_ah which contains a matrix shared by both models. When reestimating tied parameters, the data which would have been used for each of the original untied parameters is pooled so that a much more reliable estimate can be obtained.

Of course, tying could affect performance if performed indiscriminately. Hence, it is important to only tie parameters which have little effect on discrimination. This is the case here where the transition parameters do not vary significantly with acoustic context but nevertheless need to be estimated accurately. Some triphones will occur only once or twice and so very poor estimates would be obtained if tying was not done. These problems of data insufficiency will affect the output distributions too, but this will be dealt with in the next step.

Hitherto, all HMMs have been stored in text format and could be inspected like any text file. Now however, the model files will be getting larger and space and load/store times become an issue. For increased efficiency, HTK can store and load MMFs in binary  format. Setting the standard -B option causes this to happen.

  tex2html_wrap19814

Once the context-dependent models have been cloned, the new triphone set can be re-estimated using HEREST. This is done as previously except that the monophone model list is replaced by a triphone list and the triphone transcriptions are used in place of the monophone transcriptions. The context-dependent triphone list generated above (triphones0) must be augmented with the context-independent sp and sil models to give a new list (triphones1). This is simple to do by hand.

For the final pass of HEREST, the -s option should be used to generate a file of state occupation statistics called stats. In combination with the means and variances, these enable likelihoods to be calculated for clusters of states and are needed during the state-clustering process   described below. Fig. 3.13 illustrates this step of the HMM construction procedure. Re-estimation should be again done twice, so that the resultant model sets will ultimately be saved in hmm12.

   HERest -C config -I aligned.mlf -t 250.0 150.0 1000.0 -s stats \
    -S train.scp -H hmm10/macros -H hmm10/hmmdefs -M hmm11 triphones1


next up previous contents index
Next: 3.3.2 Step 10 - Making Tied-State Triphones Up: 3.3 Creating Tied-State Triphones Previous: 3.3 Creating Tied-State Triphones

ECRL HTK_V2.1: email [email protected]