Next: 2.4.1 Whats New in Version 2.1?
Up: 2 An Overview of the HTK Toolkit
Previous: 2.3.4 Analysis Tool
Whilst every effort has been made to maintain compatibility with
earlier versions of HTK some changes have been unavoidable. Also
of course, Version 2.0 contains many new features .
For the benefit of existing HTK users, this section lists the main
changes in HTK Version 2.0 compared to the preceding Version 1.5.
- The HTK libraries in HTKLib have been rewritten and
greatly extended, HSPIO has been
deleted and 11 new modules added.
- The source directory
HTKResearch has been deleted.
The tools HCOMPV, HCOPY and HSMOOTH have been moved to
the main tools directory HTKTools. The functionality
of the old HALIGN tool
is now included in HVITE and the
remaining tools in HTKResearch have been deleted.
- The tools HLAB2NET
and HCODE have been deleted. The
functionality of the former is subsumed within HVITE and
the latter within HCOPY.
- The tools HBUILD, HDMAN, HPARSE and HQUANT
have been added.
- The command line options have
changed in a number of tools. Most
changes follow from the enhancements made to functionality. However,
the standard options denoted by capital letters have been rationalised
and extended. New standard command line options include echoing of
the command line (-A), load a configuration file (-C),
display a configuration file and show which parameters have been
referenced (-D), and print version information (-V).
- The handling of strings (particularly for labels and HMMs) has
been rationalised. This has resulted in minor changes to the way MLFs
are handled. Version 1.5 behaviour can be restored by setting
the configuration variable V1COMPAT to true in HLABEL.
- The use of UNIX environment variables has been replaced by
configuration variables. One consequence of this is that library functions
are now much more configurable (see section 4.3). Also, all
library functions now have trace facilities.
- All of the main input and output file types can now be input or
output via filters. This allows standard compression tools and user-defined
translations to be applied.
- HMM definition file formats have changed. All definitions are
now macros including HMMs. The definition of a HMM set can be distributed
across multiple files.
The <Use>
clause is now ignored. Version 1.5
model definitions can be read by Version 2.0 provided that a global
options macro is loaded prior to reading any HMM definition
(see section 7.2). HMM definition files are loaded
using the standard (-H) option and the target directory
for new definition files is specified with the (-M) option.
- HMM definition files can be stored in binary format. Binary
and text formats can be mixed and loading is transparent.
- Discrete probabilities are now supported and the HMM definition
format has been extended accordingly.
- The internal representation of HMMs has changed to allow multiple
models sets to be manipulated and to allow efficient storage of
discrete probability and tied-mixture systems. Also, HMM covariance
matrices are now stored in triangular form internally.
- Memory management is now controlled centrally. This allows
tracing of memory usage and avoids the time/space overheads of
using malloc and free.
- The speech input subsystem has been replaced. Conversions directly
from waveform can be performed on-the-fly and speech can be input via
an audio device. Some minor changes have been
made to accommodate this
and hence the parameterised values produced by V2.0 may
differ slightly. To obtain identical output when processing pre-stored
waveform files the Boolean configuration variable
V1COMPAT should be set
true in HPARM.
- A VQ codebook can now be
loaded and speech parameter vectors
can have a VQ symbol attached.
- The ESPS file format has been replaced by
the new Entropic Esignal
format. Unlike ESPS, Esignal is a public format and will be common to
all future Entropic products. ESPS files can still be manipulated
directly however by attaching Esignal-ESPS filters to HTK's input
and output streams.
- The HTK label file format has been extended to include
multiple levels as well as multiple alternatives. All changes should
be backwards compatible.
- The operation of HINIT has been changed for mixture Gaussians.
Previously, clustering within each state was performed every iteration.
Now it is only used on the first iteration, thereafter the Viterbi state
and component alignment is used and mixture weights are calculated from
occupancy counts instead of cluster size.
- HEREST will now automatically reprocess an utterance
if a pruning error occurs with a higher pruning threshold. This allows
tight pruning to be set for the majority of cases allowing much faster
processing.
- HEREST can now perform single-pass retraining in which
the state/component occupation probabilities are calculated using
an existing model and training set, but the new model parameters are
calculated using a new training set.
- HHED has been extended with 9 new commands including
facilities to apply phonetic decision tree clustering to tie states
or models (TB command). The trees built during this process
can be saved and later reloaded to synthesise triphones which
were not seen in the data (ST,LT and AU commands).
- Command names in HLED now consist of 2 letters and a number
of new commands have been added. For backwards compatibility, Version 1.5
1 letter command names are still recognised. HLED has also been
modified to allow multiple levels to be edited by switching between
levels (using the ML move level command). HLED
can also use a dictionary to expand word labels to phone labels
(see the EX command).
- The recognition tool HVITE has been completely rewritten.
It is now word-based rather than phone-based and
word networks are now compiled off-line. Backwards compatibility is
provided by the tool HPARSE which will convert the old grammar
format to an equivalent word network. Version 1.5 grammar files
included the pronunciation of each word within the grammar definition.
To allow existing files in this format to be processed by Version 2.0,
HPARSE has an option to output a dictionary built from these
embedded word pronunciations.
- In Version 1.5, any name appearing in a grammar definition
for which no further definition could be found was assumed to be
the name of a HMM. In version 2.0 every name in a grammar or word network
must have a dictionary entry. Hence, to set up a
phone recogniser or a whole word recogniser it is now necessary
to have a dictionary in which the name of the appropriate HMM is
given as the pronunciation for each vocabulary item.
- HVITE has considerable extra functionality including
provision for cross-word triphones, multiple tokens, lattice generation
and N-best output. It can also perform forced alignment (subsuming
the old HALIGN tool) and rescoring of output lattices.
- HTK now supports back-off bigrams by allowing probabilities
to be attached to the transitions in a word network. HLSTATS
has been extended to compute the bigram probabilities and HBUILD
can automatically build the word network from a word list.
- HSLAB has been extended to allow point and click region
selection, direct audio recording and the annotation of multiple level
label files.
- Error numbers in Version 2.0 are now grouped into major and minor
numbers so that the lower 2 digits identify the error and the upper 2 digits
identify the module or tool.
- The HTKDemo has been rewritten and now offers the user
control over the demonstration of a wide range of the features in HTK.
Next: 2.4.1 Whats New in Version 2.1?
Up: 2 An Overview of the HTK Toolkit
Previous: 2.3.4 Analysis Tool
ECRL HTK_V2.1: email [email protected]