next up previous contents index
Next: 9.5 Tree-Based Clustering Up: 9 HMM System Refinement Previous: 9.3 Parameter Tying and Item Lists

9.4 Data-Driven Clustering

 

In section 9.2, a method of triphone construction was described which involved cloning all monophones and then re-estimating them using data for which monophone labels have been replaced by triphone labels. This will lead to a very large set of models, and relatively little training data for each model. Applying the argument that context will not greatly affect the centre states of triphone models, one way to reduce the total number of parameters without significantly altering the models' ability to represent the different contextual effects might be to tie all of the centre states across all models derived from the same monophone. This tying could be  done by writing an edit script of the form

     TI "iyS3" {*-iy+*.state[3]}
     TI "ihS3" {*-ih+*.state[3]}
     TI "ehS3" {*-eh+*.state[3]}
      .... etc
Each TI command would tie all the centre states of all triphones in each phone group. Hence, if there were an average of 100 triphones per phone group then the total number of states per group would be reduced from 300 to 201.

Explicit tyings such as these can have some positive effect but overall they are not very satisfactory. Tying all centre states is too severe and worse still, the problem of undertraining for the left and right states remains. A much better approach is to use clustering to decide which states to tie. HHED provides two mechanisms for this. In this section a data-driven clustering approach will be described and in the next section, an alternative decision tree-based approach is presented.

Data-driven clustering is performed by the   TC  and NC  commands. These both invoke the same top-down hierarchical procedure. Initially all states are placed in individual clusters. The pair of clusters which when combined would form the smallest resultant cluster are merged. This process repeats until either the size of the largest cluster reaches the threshold set by the TC command or the total number of clusters has fallen to that specified by by the NC command. The size of cluster is defined as the greatest distance between any two states. The distance metric depends on the type of state distribution. For single Gaussians, a weighted Euclidean distance between the means is used and for tied-mixture systems a Euclidean distance between the mixture weights is used. For all other cases, the average probability of each component mean with respect to the other state is used. The details of the algorithm and these metrics are given in the reference section for HHED.

  tex2html_wrap21978

As an example, the following HHED script would cluster and tie the corresponding states of the triphone group for the phone ih

     TC 100.0 "ihS2" {*-ih+*.state[2]}
     TC 100.0 "ihS3" {*-ih+*.state[3]}
     TC 100.0 "ihS4" {*-ih+*.state[4]}
In this example, each TC command performs clustering on the specified set of states, each cluster is tied and output as a macro. The macro name is generated by appending the cluster index to the macro name given in the command. The effect of this command is illustrated in Fig. 9.2. Note that if a word-internal triphone system is being built, it is sensible to include biphones as well as triphones in the item list, for example, the first command above would be written as
     TC 100.0 "ihS2" {(*-ih,ih+*,*-ih+*).state[2]}
If the above TC commands are repeated for all phones, the resulting set of tied-state models will have far fewer parameters in total than the original untied set. The numeric argument immediately following the TC command name is the cluster threshold. Increasing this value will allow larger and hence, fewer clusters. The aim, of course, is to strike the right balance between compactness and the acoustic accuracy of the individual models. In practice, the use of this command requires some experimentation to find a good threshold value. HHED provides extensive trace output for monitoring clustering operations. Note in this respect that as well as setting tracing from the command line and the configuration file, tracing in HHED can be set by the TR command. Thus, tracing can be controlled at the command level. Further trace information can be obtained by including the SH command  at strategic points in the edit script. The effect of executing this command is to list out all of the parameter tyings currently in force.

A potential problem with the use of the TC and NC commands is that outlier states will tend to form their own singleton clusters  for which there is then insufficient data to properly train. One solution to this is to use the RO command  to remove outliers . This commmand has the form

     RO thresh "statsfile"
where statsfile is the name of a statistics file  output using the -s option of HEREST. This statistics file holds the occupation counts for all states of the HMM set being trained. The term occupation count refers to the number of frames allocated to a particular state and can be used as a measure of how much training data is available for estimating the parameters of that state. The RO command must be executed before the TC or NC commands used to do the actual clustering. Its effect is to simply read in the statistics information from the given file and then to set a flag instructing the TC or NC commands to remove any outliers remaining at the conclusion of the normal clustering process. This is done by repeatedly finding the cluster with the smallest total occupation count and merging it with its nearest neighbour. This process is repeated until all clusters have a total occupation count which exceeds thresh, thereby ensuring that every cluster of states will be properly trained in the subsequent re-estimation performed by HEREST. 

On completion of the above clustering and tying procedures, many of the models may be effectively identical, since acoustically similar triphones may share common clusters for all their emitting states. They are then, in effect, so-called generalised triphones.  State tying can be further exploited if the HMMs which are effectively equivalent are identified and then tied via the physical-logical mappinggif facility provided by HMM lists (see section 7.4). The effect of this would be to reduce the total number of HMM definitions required. HHED provides a compaction command to do all of this automatically. For example, the command

     CO newList
 will compact  the currently loaded HMM set by identifying equivalent models and then tying them via the new HMM list output to the file newList. Note, however, that for two HMMs to be tied, they must be identical in all respects. This is one of the reasons why transition parameters are often tied across triphone groups otherwise HMMs with identical states would still be left distinct due to minor differences in their transition matrices.


next up previous contents index
Next: 9.5 Tree-Based Clustering Up: 9 HMM System Refinement Previous: 9.3 Parameter Tying and Item Lists

ECRL HTK_V2.1: email [email protected]