HINIT is used to provide initial estimates for the parameters
of a single HMM using a set of observation sequences.
It works by repeatedly using Viterbi alignment to segment the
training observations and then recomputing the parameters
by pooling the vectors in each segment. For mixture Gaussians, each
vector in each segment is aligned with the component with the highest
likelihood. Each cluster of vectors then determines the parameters
of the associated mixture component.
In the absence of an initial model, the process
is started by performing a uniform
segmentation of each training observation and for mixture Gaussians,
the vectors in each uniform segment are clustered using a modified K-Means
algorithm
.
HINIT can be used to provide initial estimates of whole word models in which case the observation sequences are realisations of the corresponding vocabulary word. Alternatively, HINIT can be used to generate initial estimates of seed HMMs for phoneme-based speech recognition. In this latter case, the observation sequences will consist of segments of continuously spoken training material. HINIT will cut these out of the training data automatically by simply giving it a segment label.
In both of the above applications, HINIT normally takes
as input a prototype
HMM definition which defines the required HMM topology i.e. it has
the form of the required HMM except that means, variances and mixture
weights are ignored
. The
transition matrix of the prototype specifies both the allowed
transitions and their initial probabilities. Transitions which
are assigned zero probability will remain zero and hence denote
non-allowed transitions. HINIT estimates transition probabilities
by counting the number of times each state is visited during
the alignment process.
HINIT supports multiple mixtures, multiple streams, parameter tying within a single model, full or diagonal covariance matrices, tied-mixture models and discrete models. The output of HInit is typically input to HRest.
Like all re-estimation tools, HINIT allows a floor to be set on each individual variance by defining a variance floor macro for each data stream (see chapter 8).