Next: 13.4.2 Use
Up: 13.4 HDMan
Previous: 13.4 HDMan
The HTK tool HDMAN is used to prepare a pronunciation dictionary
from one or more sources. It
reads in a list of editing commands from a
script file and then outputs an edited and merged copy of
one or more dictionaries.
Each source pronunciation dictionary consists of comment lines and
definition lines.
Comment lines start with the # character
(or optionally any one of a set of specified comment chars)
and are ignored by HDMAN.
Each definition line starts with a word and is followed by
a sequence of symbols (phones) that define the pronunciation.
The words and the phones are delimited by spaces
or tabs, and the end of line delimits each definition.
Each edit command in the script file must be on a separate line.
Lines in the script file starting with a # are comment lines
and are ignored. The commands supported are listed below. They
can be displayed by HDMAN using the
-Q option.
When no edit files are specified, HDMAN simply merges all of
the input dictionaries and outputs them in sorted order. All input
dictionaries must be sorted. Each input dictionary xxx may be
processed by its own private set of edit commands stored in xxx.ded.
Subsequent to the processing of the input dictionaries by their own
unique edit scripts, the merged dictionary can be processed by
commands in global.ded (or some other specified
global edit file name).
Dictionaries are processed on a word by word basis in the order that
they appear on the command line. Thus, all of
the pronunciations for a given word are loaded into a buffer, then
all edit commands are applied to these pronunciations. The result
is then output and the next word loaded.
Where two or more dictionaries give pronunciations for the same word,
the default behaviour is that only the first set of pronunciations
encountered are retained and all others are ignored. An option exists
to override this so that all pronunciations are concatenated.
Dictionary entries can be filtered by a word list such that all
entries not in the list are ignored.
The edit commands provided by HDMAN are as follows
- AS A B ...
- Append silence models A, B, etc to
each pronunciation.
- CR X A Y B
- Replace phone Y in the context of A_B
by X. Contexts may include an asterix * to denote any
phone or a defined context set
defined using the DC command.
- DC X A B ...
- Define the set A, B, ...as
the context X.
- DD X A B ...
- Delete the definition for word X starting
with phones A, B, ....
- DP A B C ...
- Delete any occurrences of phones A or
B or C ....
- DS src
- Delete each pronunciation from source src
unless it is the only one for the current word.
- DW X Y Z ...
- Delete words (& definitions) X,
Y, Z, ....
- FW X Y Z ...
- Define X,
Y, Z, ... as function words and
change each phone
in the definition to a function word specific phone. For example,
in word W phone A would become W.A.
- IR
- Set the input mode to raw.
In raw mode, words are regarded as arbitrary sequences of printing
chars. In the default mode, words are strings as defined
in section 4.6.
- LC [X]
- Convert all phones to be left-context dependent. If X is given
then the 1st phone a in each word is changed to X-a
otherwise it is unchanged.
- LP
- Convert all phones to lowercase.
- LW
- Convert all words to lowercase.
- MP X A B ...
- Merge any sequence of phones A B
... and rename as X.
- RC [X]
- Convert all phones to be right-context dependent. If
X is given then the last phone z in each word is
changed to z+X otherwise it is unchanged.
- RP X A B ...
- Replace all occurrences of phones A
or B ...by X.
- RS system
- Remove stress marking. Currently the only
stress marking system
supported is that used in the dictionaries produced by
Carnegie Melon University (system = cmu).
- RW X A B ...
- Replace all occurrences of word A
or B ...by X.
- SP X A B ...
- Split phone X into the sequence
A B C ....
- TC [X [Y]]
- Convert phones to triphones. If
X is given then the first phone a is converted to
X-a+b otherwise it is unchanged. If Y is given
then the last phone z is converted to y-z+Y
otherwise if X is given
then it is changed to y-z+X otherwise it is unchanged..
- UP
- Convert all phones to uppercase.
- UW
- Convert all words to uppercase.
Next: 13.4.2 Use
Up: 13.4 HDMan
Previous: 13.4 HDMan
ECRL HTK_V2.1: email [email protected]