To conclude this chapter, this section presents a formal description of the HMM definition language used by HTK. Syntax is described using an extended BNF notation in which alternatives are separated by a vertical bar |, parentheses () denote factoring, brackets [ ] denote options, and braces {} denote zero or more repetitions.
All keywords are enclosed in angle brackets and the case of the keyword name is not significant. White space is not significant except within double-quoted strings.
The top level structure of a HMM definition is shown by the following
rule.
<BeginHMM>
[ globalOpts ]
<NumStates> short
state { state }
transP
[ duration ]
<EndHMM>
hmmdef = [ h macro ]
A HMM definition consists of an optional set of global options followed by
the <NumStates> keyword whose following argument specifies the number of states in
the model inclusive of the non-emitting entry and exit states.
The information for each state is then given in turn, followed by the
parameters of the transition matrix and the model duration parameters, if any.
The name of the HMM is given by the h macro. If the HMM is the
only definition within a file, the h macro name can be omitted
and the HMM name is assumed to be the same as the file name.
The global options are common to all HMMs. They can be given
separately using a o option macro
optmacro = o globalOpts
or they can be included in one or more HMM definitions. Global options may be repeated but no definition can change a previous definition. All global options must be defined before any other macro definition is processed. In practice this means that any HMM system which uses parameter tying must have a o option macro at the head of the first macro file processed.
The full set of global options is given below. Every HMM set must
define the vector size (via <VecSize> ), the stream widths
(via <StreamInfo> )
and the observation parameter kind. However, if only the stream
widths are given then the vector size will be inferred. If
only the vector size is given, then a single stream of identical
width will be assumed. All other options default to null.
option = <StreamInfo> short { short } |
<VecSize> short |
covkind |
durkind |
parmkind
globalOpts = option { option }
The arguments to the <StreamInfo> option are the number of streams (default 1) and then for each stream, the width of that stream. The <VecSize> option gives the total number of elements in each input vector. If both <VecSize> and <StreamInfo> are included then the sum of all the stream widths must equal the input vector size.
The covkind defines the kind of the covariance matrix
<LLTC> | <XformC>
covkind = <DiagC> | <InvDiagC> | <FullC> |
where <InvDiagC> is used internally. <LLTC> and <XformC> are not used in HTK Version 2.0. Setting the covariance kind as a global option forces all components to have this kind. In particular, it prevents mixing full and diagonal covariances within a HMM set.
The durkind denotes the type of duration
model used according to the following rules
durkind = <nullD> | <poissonD> | <gammaD> | <genD>
For anything other than <nullD>, a duration vector must be supplied for the model or each state as described below.
The parameter kind is any legal parameter kind including qualified forms
(see section 5.1)
basekind = <discrete>|<lpc>|<lpcepstra>|<mfcc> | <fbank> |
<melspec>| <lprefc>|<lpdelcep> | <user>
parmkind = <basekind{_D|_A|_E|_N|_Z|_O}>
where the syntax rule for parmkind is non-standard in that no spaces are allowed between the base kind and any subsequent qualifiers. As noted in chapter 5, <lpdelcep> is provided only for compatibility with earlier versions of HTK and its further use should be avoided.
Each state of each HMM must have its own section defining the parameters
associated with that state
state = <State> short stateinfo
where the short following <State> is the state number. State
information can be defined in any order. The syntax is as follows
[ mixes ] [ weights ] stream { stream } [ duration ]
macro = string
stateinfo = s macro |
A stateinfo definition consists of an optional specification of the number of mixtures, an optional set of stream weights, followed by a block of information for each stream, optionally terminated with a duration vector. Alternatively, s macro can be written where macro is the name of a previously defined macro.
The optional mixes in a stateinfo definition specify
the number of mixture components (or discrete codebook size) for
each stream of that state
mixes = <NumMixes> short {short}
where there should be one short for each stream. If this specification is omitted, it is assumed that all streams have just one mixture component.
The optional weights in a stateinfo definition define
a set of exponent weights for each independent data stream. The
syntax is
vector = float { float }
weights = w macro | <SWeights> short vector
where the short gives the number S of weights (which should match the value given in the <StreamInfo> option) and the vector contains the S stream weights (see section 7.1).
The definition of each stream
depends on the kind of HMM set. In the normal case, it
consists of a sequence of mixture
component
definitions optionally preceded by the stream number. If the stream
number is omitted then it is assumed to be 1. For tied-mixture
and discrete HMM sets, special forms are used.
(mixture { mixture } | tmixpdf | discpdf)
stream = [ <Stream> short ]
The definition of each mixture component consists of a Gaussian
pdf optionally preceded by the mixture number and its weight
mixture = [ <Mixture> short float ] mixpdf
If the <Mixture> part is missing then mixture 1 is assumed and the weight defaults to 1.0.
The tmixpdf option is used only for fully
tied mixture sets. Since the mixpdf parts are all macros in
a tied mixture system and since they are identical for every stream
and state, it is only necessary to know the mixture weights. The
tmixpdf syntax allows these to be specified in the following
compact form
weightList = repShort { repShort }
repShort = short [ char ]
tmixpdf = <TMix> macro weightList
where each short is a mixture component weight scaled so that a weight of 1.0 is represented by the integer 32767. The optional asterix followed by a char is used to indicate a repeat count. For example, 0*5 is equivalent to 5 zeroes. The Gaussians which make-up the pool of tied-mixtures are defined using m macros called macro1, macro2, macro3, etc.
Discrete probability HMMs are defined in a similar way
discpdf = <DProb> weightList
The only difference is that the weights in the weightList are scaled log probabilities as defined in section 7.6.
The definition of a Gaussian pdf requires the mean vector to
be given and one of the possible forms of covariance
mean = u macro | <Mean> short vector
cov = var | inv | xform
var = v macro | <Variance> short vector
inv = i macro |
(<InvCovar> | <LLTCovar>) short tmatrix
xform = x macro | <Xform> short short matrix
matrix = float {float}
tmatrix = matrix
mixpdf = m macro | mean cov [ <GConst> float ]
In mean and var, the short preceding the vector defines the length of the vector, in inv the short preceding the tmatrix gives the size of this square upper triangular matrix, and in xform the two short's preceding the matrix give the number of rows and columns. The optional <GConst> gives that part of the log probability of a Gaussian that can be precomputed. If it is omitted, then it will be computed during load-in, including it simply saves some time. HTK tools which output HMM definitions always include this field.
In addition to defining the output distributions, a state can have a
duration probability distribution defined for it.
duration = d macro | <Duration> short vector
Alternatively, as shown by the top level syntax for a hmmdef, duration parameters can be specified for a whole model.
Finally, the transition matrix is defined by
transP = t macro | <TransP> short matrix
where the short in this case should be equal to the number of states in the model.