Next: 2.4.1 Whats New in Version 2.1? Up: 2 An Overview of the HTK Toolkit Previous: 2.3.4 Analysis Tool

2.4 Whats New in Version 2.0?

Whilst every effort has been made to maintain compatibility with earlier versions of HTK some changes have been unavoidable. Also of course, Version 2.0 contains many new features . For the benefit of existing HTK users, this section lists the main changes in HTK Version 2.0 compared to the preceding Version 1.5.

The HTK libraries in HTKLib have been rewritten and greatly extended, HSPIO has been deleted and 11 new modules added.
The source directory HTKResearch has been deleted. The tools HCOMPV, HCOPY and HSMOOTH have been moved to the main tools directory HTKTools. The functionality of the old HALIGN tool is now included in HVITE and the remaining tools in HTKResearch have been deleted.
The tools HLAB2NET and HCODE have been deleted. The functionality of the former is subsumed within HVITE and the latter within HCOPY.
The tools HBUILD, HDMAN, HPARSE and HQUANT have been added.
The command line options have changed in a number of tools. Most changes follow from the enhancements made to functionality. However, the standard options denoted by capital letters have been rationalised and extended. New standard command line options include echoing of the command line (-A), load a configuration file (-C), display a configuration file and show which parameters have been referenced (-D), and print version information (-V).
The handling of strings (particularly for labels and HMMs) has been rationalised. This has resulted in minor changes to the way MLFs are handled. Version 1.5 behaviour can be restored by setting the configuration variable V1COMPAT to true in HLABEL.
The use of UNIX environment variables has been replaced by configuration variables. One consequence of this is that library functions are now much more configurable (see section 4.3). Also, all library functions now have trace facilities.
All of the main input and output file types can now be input or output via filters. This allows standard compression tools and user-defined translations to be applied.
HMM definition file formats have changed. All definitions are now macros including HMMs. The definition of a HMM set can be distributed across multiple files. The <Use> clause is now ignored. Version 1.5 model definitions can be read by Version 2.0 provided that a global options macro is loaded prior to reading any HMM definition (see section 7.2). HMM definition files are loaded using the standard (-H) option and the target directory for new definition files is specified with the (-M) option.
HMM definition files can be stored in binary format. Binary and text formats can be mixed and loading is transparent.
Discrete probabilities are now supported and the HMM definition format has been extended accordingly.
The internal representation of HMMs has changed to allow multiple models sets to be manipulated and to allow efficient storage of discrete probability and tied-mixture systems. Also, HMM covariance matrices are now stored in triangular form internally.
Memory management is now controlled centrally. This allows tracing of memory usage and avoids the time/space overheads of using malloc and free.
The speech input subsystem has been replaced. Conversions directly from waveform can be performed on-the-fly and speech can be input via an audio device. Some minor changes have been made to accommodate this and hence the parameterised values produced by V2.0 may differ slightly. To obtain identical output when processing pre-stored waveform files the Boolean configuration variable V1COMPAT should be set true in HPARM.
A VQ codebook can now be loaded and speech parameter vectors can have a VQ symbol attached.
The ESPS file format has been replaced by the new Entropic Esignal format. Unlike ESPS, Esignal is a public format and will be common to all future Entropic products. ESPS files can still be manipulated directly however by attaching Esignal-ESPS filters to HTK's input and output streams.
The HTK label file format has been extended to include multiple levels as well as multiple alternatives. All changes should be backwards compatible.
The operation of HINIT has been changed for mixture Gaussians. Previously, clustering within each state was performed every iteration. Now it is only used on the first iteration, thereafter the Viterbi state and component alignment is used and mixture weights are calculated from occupancy counts instead of cluster size.
HEREST will now automatically reprocess an utterance if a pruning error occurs with a higher pruning threshold. This allows tight pruning to be set for the majority of cases allowing much faster processing.
HEREST can now perform single-pass retraining in which the state/component occupation probabilities are calculated using an existing model and training set, but the new model parameters are calculated using a new training set.
HHED has been extended with 9 new commands including facilities to apply phonetic decision tree clustering to tie states or models (TB command). The trees built during this process can be saved and later reloaded to synthesise triphones which were not seen in the data (ST,LT and AU commands).
Command names in HLED now consist of 2 letters and a number of new commands have been added. For backwards compatibility, Version 1.5 1 letter command names are still recognised. HLED has also been modified to allow multiple levels to be edited by switching between levels (using the ML move level command). HLED can also use a dictionary to expand word labels to phone labels (see the EX command).
The recognition tool HVITE has been completely rewritten. It is now word-based rather than phone-based and word networks are now compiled off-line. Backwards compatibility is provided by the tool HPARSE which will convert the old grammar format to an equivalent word network. Version 1.5 grammar files included the pronunciation of each word within the grammar definition. To allow existing files in this format to be processed by Version 2.0, HPARSE has an option to output a dictionary built from these embedded word pronunciations.
In Version 1.5, any name appearing in a grammar definition for which no further definition could be found was assumed to be the name of a HMM. In version 2.0 every name in a grammar or word network must have a dictionary entry. Hence, to set up a phone recogniser or a whole word recogniser it is now necessary to have a dictionary in which the name of the appropriate HMM is given as the pronunciation for each vocabulary item.
HVITE has considerable extra functionality including provision for cross-word triphones, multiple tokens, lattice generation and N-best output. It can also perform forced alignment (subsuming the old HALIGN tool) and rescoring of output lattices.
HTK now supports back-off bigrams by allowing probabilities to be attached to the transitions in a word network. HLSTATS has been extended to compute the bigram probabilities and HBUILD can automatically build the word network from a word list.
HSLAB has been extended to allow point and click region selection, direct audio recording and the annotation of multiple level label files.
Error numbers in Version 2.0 are now grouped into major and minor numbers so that the lower 2 digits identify the error and the upper 2 digits identify the module or tool.
The HTKDemo has been rewritten and now offers the user control over the demonstration of a wide range of the features in HTK.

2.4.1 Whats New in Version 2.1?

Next: 2.4.1 Whats New in Version 2.1? Up: 2 An Overview of the HTK Toolkit Previous: 2.3.4 Analysis Tool

ECRL HTK_V2.1: email [email protected]