This program is used to perform a single re-estimation of the parameters of a set of HMMs using an embedded training version of the Baum-Welch algorithm. Training data consists of one or more utterances each of which has a transcription in the form of a standard label file (segment boundaries are ignored). For each training utterance, a composite model is effectively synthesised by concatenating the phoneme models given by the transcription. Each phone model has the same set of accumulators allocated to it as are used in HRest but in HEREST they are updated simultaneously by performing a standard Baum-Welch pass over each training utterance using the composite model.
HEREST is intended to operate on HMMs with initial parameter values estimated by HInit/HRest. HEREST supports multiple mixture Gaussians, discrete and tied-mixture HMMs, multiple data streams, parameter tying within and between models, and full or diagonal covariance matrices. HEREST also supports tee-models (see section 7.7) for handling optional silence and non-speech sounds. These may be placed between the units (typically words or phones) listed in the transcriptions but they cannot be used at the start or end of a transcription.
HEREST includes features to allow parallel operation where a network of processors is available. Like all re-estimation tools, HEREST allows a floor to be set on each individual variance by defining a variance floor macro for each data stream (see chapter 8).
HEREST also supports single pass retraining. Given a set of well-trained models, a set of new models using a different parameterisation of the training data can be generated in a single pass. This is done by computing the forward and backward probabilities using the original well-trained models and the original training data, but then switching to a new set of training data to compute the new parameter estimates.
HEREST operates in two distinct stages.
Thus, on a single processor the default combination 1(a) and 2(a) would be used. However, if N processors are available then the training data would be split into N equal groups and HEREST would be set to process one data set on each processor using the combination 1(a) and 2(b). When all processors had finished, the program would then be run again using the combination 1(b) and 2(a) to load in the partial accumulators created by the N processors and do the final parameter re-estimation. The choice of which combination of operations HEREST will perform is governed by the -p option switch as described below.
As a further performance optimisation, HEREST will also prune the and matrices. By this means, a factor of 3 to 5 speed improvement and a similar reduction in memory requirements can be achieved with negligible effects on training performance (see the -t option below).