- ...weight
- often
referred to as a codebook exponent.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...probability
-
Since the output distributions are densities, these are not
really probabilities but it is a convenient fiction.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...non-emitting
-
To understand equations involving a non-emitting state at time t, the time
should be thought of as being 64#64 if it is an entry state, and 65#65
if it is an exit state. This becomes important when HMMs are connected together
in sequence so that transitions across non-emitting states take place
between frames.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...accumulators
-
Note that normally the summations in
the denominators of the re-estimation formulae are identical
across the parameter sets of a given state and therefore
only a single common storage location for the denominators
is required and it need only be calculated once. However,
HTK supports a generalised parameter tying mechanism
which can result in the denominator summations being
different. Hence, in HTK the denominator summations
are always stored and calculated individually
for each distinct parameter vector or matrix.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...operation
-
They can even be avoided altogether by using a flat start
as described in section 8.3.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...minor
-
In practice, a good deal of extra work is needed to achieve
efficient operation on large training databases. For example,
the HEREST tool includes facilities for
pruning on both the forward and backward passes and
parallel operation on a network of machines.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...Model
-
See ``Token Passing: a Conceptual Model for Connected Speech
Recognition Systems'', SJ Young, NH Russell and JHS Thornton,
CUED Technical Report F_INFENG/TR38, Cambridge University, 1989.
Available by anonymous ftp from svr-ftp.eng.cam.ac.uk.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...used
- Available by anonymous ftp from
svr-ftp.eng.cam.ac.uk/comp.speech/dictionaries/beep.tar.gz.
Note that items beginning with unmatched quotes, found at the start
of the dictionary, should be removed.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...files
-
Not to be confused with files containing edit scripts
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...arguments
-
Most UNIX shells, especially the C shell, only allow a limited and
quite small number of arguments.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...together
-
Note that if the transition matrices had not been tied, the CO
command would be ineffective since all models would be different by
virtue of their unique transition matrices.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...is
- All of the examples in this book assume the
UNIX Operating System and the C Shell but the principles apply to
any OS which supports hierarchical files and command line arguments
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...element)
- Block sizes typically grow as
more blocks are allocated
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...retries
- This does not work if input filters are used.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...units
-
The somewhat bizarre choice of 100nsec units originated in Version 1 of
HTK when times were represented by integers and this unit was the best
compromise between precision and range. Times are now represented by
doubles and hence the constraints no longer apply. However, the need for backwards
compatibility means that 100nsec units have been retained. The names
SOURCERATE and TARGETRATE are also non-ideal,
SOURCEPERIOD and TARGETPERIOD would be better.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...input
- This method of applying a zero mean is different to
HTK Version 1.5 where the mean was calculated and subtracted from the
whole speech file in one operation. The configuration variable
V1COMPAT can be set to revert to this older behaviour.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...databases
- Many of the
more recent speech databases use compression. In these cases, the data may be
regarded as being logically encoded as a sequence of 2-byte integers even if
the actual storage uses a variable length encoding scheme.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...function
-
Note that some textbooks define the denominator of equation 5.4
as 145#145 so that the filter coefficients are the
negatives of those computed by HTK.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...parameterisation
- In any
event, setting the compatibility variable V1COMPAT to true in
HPARM will ensure that the calculation of energy is compatible with
that computed by the Version 1 tool HCODE.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...173#173
- Unless V1COMPAT is
set to true.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...discarded
-
Some applications may require the 0'th order cepstral coefficient
in order to recover the filterbank coefficients from the cepstral
coefficients.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...letters
-
Some command names have single
letter alternatives for compatibility with
earlier versions of HTK.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...234#234
-
No current HTK tool can estimate or use these.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...element
-
Covariance matrices are actually stored internally in lower triangular
form
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...<InvCovar>
-
The Choleski storage format is not used by default in HTK Version 2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...matrix
-
Transform matrices are not used by any of the supported HTK tools.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...definition
- The fact that
this is possible does not mean that it is recommended practice!
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...brackets
-
This definition covers the textual version only. The syntax for
the binary format
is identical apart from the way that the lexical items are encoded.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...states
-
Integer numbers are specified as either char or short.
This has no effect on text-based definitions but for binary format it indicates
the underlying C type used to represent the number.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...mapping
- The physical
HMM which corresponding to several logical HMMs will be arbitrarily named after
one of them.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...likely
-
Remember that discrete probabilities are scaled such that
32767 is equivalent to a probability of 0.000001 and 0 is
equivalent to a probability of 1.0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...HMMs
- Also called semi-continuous HMMs in the the literature.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...words
- More precisely, nodes represent the ends of
words and arcs represent the transitions between word ends.
This distinction becomes important when describing
recognition output since acoustic scores are attached
to arcs not nodes.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...definitions
-
Large HMM sets will often be distributed across a number of MMF files,
in this case, the -H option will be repeated for each file.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...10
- The default behaviour of HRESULTS is
slightly different to the widely used US NIST scoring software which uses
weights of 3,3 and 4 and a slightly different alignment algorithm. Identical
behaviour to NIST can be obtained by setting the -n option.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...words
-
All the examples here will assume that each label corresponds to a word
but in general the labels could stand for any recognition unit such as
phones, syllables, etc. HRESULTS does not care what the labels
mean but for human consumption, the labels SENT
and WORD can be changed using the -a and -b
options.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...NAME=11415> word
-
The HLED EX command can be used to compute phone
level transcriptions when there is only one possible
phone transcription
per word
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...control
- The underlying signal number must be
given, HTK cannot interpret the standard Unix signal names such as
SIGINT
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...options
- HCOPY thus takes on the functionality of
the former HTK tool HCODE, which no longer exists in V2.0.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...algorithm
- This algorithm is significantly different from
earlier versions of HTK where K-means clustering was used at every
iteration and the Viterbi alignment was limited to states
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...ignored
- Prototypes should either
have GConst set (the value does
not matter) to avoid HTK trying to compute it or
variances should be set to a positive value such as 1.0 to
ensure that GConst is computable
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...labels
- In earlier versions of
HTK, HLED command names consisted of a single letter. These
are still supported for backwards compatibility and they are included
in the command summary produced using the -Q option.
However, new commands
introduced in version 2.0 have two letter names.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...TARGETKIND
- The TARGETKIND is equivalent to
the HCOERCE environment variable used in earlier versions
of HTK
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...grammar.
- The expression between
double angle brackets must be a simple list of alternative node names or
a variable which has such a list as its value
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...recognition
- In HTK V2 it is preferable for
these context-loop expansions to be done automatically via HNET,
to avoid requiring a dictionary entry for every context-dependent
model
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...used
- If the base-names or left/right context of the context-dependent names in a context-dependent loop are variables,
no $ symbols are used when writing the context-dependent
nodename.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...revised
- With the added benefit
of rectifying some residual bugs in the HTK V1.5 implementation
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...mechanism
- Using this option
only makes sense if the HMM has skip transitions
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...transcriptions
-
The choice of ``Sentence'' and ``Word'' here is the usual
case but is otherwise arbitrary.
HRESULTS just compares label sequences. The sequences
could be paragraphs, sentences, phrases or words, and the labels
could be phrases, words, syllables or phones, etc. Options exist
to change the output designations `SENT' and `WORD' to whatever
is appropriate.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...mode
- It is not,of course,
necessary to have multiple processors to use this program since each
`parallel' activation can be executed sequentially on a single
processor
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.