next up previous contents index
Next: 3.2.2 Step 7 - Fixing the Silence Models Up: 3.2 Creating Monophone HMMs Previous: 3.2 Creating Monophone HMMs

3.2.1 Step 6 - Creating Flat Start Monophones

The first step in HMM training is to define a prototype model. The parameters of this model are not important, its purpose is to define the model topology. For phone-based systems, a good topology to use is 3-state left-right with no skips such as the following

    ~o <VecSize> 39 <MFCC_0_D_A>
    ~h "proto"
    <BeginHMM>
     <NumStates> 5
     <State> 2
        <Mean> 39
          0.0 0.0 0.0 ...
        <Variance> 39
          1.0 1.0 1.0 ...
     <State> 3
        <Mean> 39
          0.0 0.0 0.0 ...
        <Variance> 39
          1.0 1.0 1.0 ...
     <State> 4
        <Mean> 39
          0.0 0.0 0.0 ...
        <Variance> 39
          1.0 1.0 1.0 ...
     <TransP> 5
      0.0 1.0 0.0 0.0 0.0
      0.0 0.6 0.4 0.0 0.0
      0.0 0.0 0.6 0.4 0.0
      0.0 0.0 0.0 0.7 0.3
      0.0 0.0 0.0 0.0 0.0
    <EndHMM>
where each ellipsed vector is of length 39. This number, 39, is computed from the length of the parameterised static vector (MFCC_0 = 13) plus the delta coefficients (+13) plus the acceleration coefficients (+13).

The HTK tool HCOMPV  will scan a set of data files, compute the global mean and variance and set all of the Gaussians in a given HMM to have the same mean and variance.  Hence, assuming that a list of all the training files is stored in train.scp, the command

    HCompV -C config -f 0.01 -m -S train.scp -M hmm0 proto
will create a new version of proto in the directory hmm0 in which the zero means and unit variances above have been replaced by the global speech means and variances. Note that the prototype HMM defines the parameter kind as MFCC_0_D_A. This means that delta and acceleration coefficients are to be computed and appended to the static MFCC coefficients computed and stored during the coding process described above. To ensure that these are computed during loading, the configuration file config should be modified to change the target kind, i.e. the configuration file entry for TARGETKIND should be changed to
   TARGETKIND = MFCC_0_D_A
HCOMPV has a number of options specified for it. The -f option causes a variance floor macro  (called vFloors) to be generated which is equal to 0.01 times the global variance. This is a vector of values which will be used to set a floor on the variances estimated in the subsequent steps. The -m option asks for means to be computed as well as variances. Given this new prototype model stored in the directory hmm0, a Master Macro File  (MMF) called hmmdefs   is constructed containing a copy for each of the required monophone HMMs. The format of an MMF is similar to that of an MLF and it serves a similar purpose in that it avoids having a large number of individual HMM definition files  (see Fig. 3.7).

  tex2html_wrap19802

The flat start monophones stored in the directory hmm0 are re-estimated using the embedded re-estimation  tool HEREST  invoked as follows

   HERest -C config -I phones0.mlf -t 250.0 150.0 1000.0 \
    -S train.scp -H hmm0/macros -H hmm0/hmmdefs -M hmm1 monophones0
The effect of this is to load all the models in hmm0 which are listed in the model list monophones0, reestimate them using the data listed in train.scp and store the new model set in the directory hmm1. Most of the files used in this invocation of HEREST have already been described. The exception is the file macros. This should contain a so-called global options macro and the variance floor macro vFloors generated earlier. The global options macro simply defines the HMM parameter kind and the vector size i.e.
   ~o <MFCC_0_D_A> <VecSize> 39
See Fig. 3.7. This can be combined with vFloors into a text file called macros.

  tex2html_wrap19804

The -t option sets the pruning  thresholds to be used during training. Pruning limits the range of state alignments that the forward-backward algorithm includes in its summation and it can reduce the amount of computation required by an order of magnitude. For most training files, a very tight pruning threshold can be set, however, some training files will provide poorer acoustic matching and in consequence a wider pruning beam is needed. HEREST deals with this by having an auto-incrementing pruning threshold. In the above example, pruning is normally 250.0. If re-estimation fails on any particular file, the threshold is increased by 150.0 and the file is reprocessed. This is repeated until either the file is successfully processed or the pruning limit of 1000.0 is exceeded. At this point it is safe to assume that there is a serious problem with the training file and hence the fault should be fixed (typically it will be an incorrect transcription) or the training file should be discarded. The process leading to the initial set of monophones in the directory hmm0 is illustrated in Fig. 3.8.

Each time HEREST is run it performs a single re-estimation. Each new HMM set is stored in a new directory. Execution of HEREST should be repeated twice more, changing the name of the input and output directories (set with the options -H and -M) each time, until the directory hmm3 contains the final set of initialised monophone HMMs.


next up previous contents index
Next: 3.2.2 Step 7 - Fixing the Silence Models Up: 3.2 Creating Monophone HMMs Previous: 3.2 Creating Monophone HMMs

ECRL HTK_V2.1: email [email protected]