Many HTK definition files include names of various types of objects: for example labels, model names, words in a grammar, etc. In order to achieve some uniformity, HTK applies standard rules for reading strings which are names.
A name string consists of a single white space delimited word or
a quoted string. Either the single quote '
or
the double quote "
can be used to quote strings but the
start and end quotes must be matched. The backslash \
character can also
be used to introduce otherwise reserved characters. The
character following a backslash is inserted into the string without special
processing unless that character is a digit in the range 0 to 7.
In that case, the three
characters following the backslash are read and interpreted as an octal
character code. When the three characters are not octal digits the result
is not well defined.
In summary the special processing is
Notation | Meaning |
\\ | \ |
\_ | represents a space that will not terminate a string |
\' | ' (and will not end a quoted string) |
\" | " (and will not end a quoted string) |
\nnn | the character with octal code \nnn |
Note that the above allows the same effect to be achieved in a number of different ways. For example,
"\"QUOTE" \"QUOTE '"QUOTE' \042QUOTEall produce the string
"QUOTE
.
The only exceptions to the above general rules are:
,
),
dots (.
),
and closing brackets ()
)
are all used as extra delimiters to allow HHED scripts
created for earlier versions of HTK to be used unchanged.
Hence for example, (a,b,c,d) would be split into 4
distinct name strings a, b, c and d.///
terminators
alone on a line with no surrounding white space.
If this causes problems reading old MLF files, the configuration
variable V1COMPAT should be set true in the module HLABEL.
In this case,
HTK will attempt to simulate the behaviour of the older version 1.5.'
or "
then output labels will be
quoted with the specified quote character. If QUOTECHAR is set to
\
, then output labels will be escaped. The default is to select the
simplest quoting mechanism.
Note that under some versions of Unix HTK can support the 8-bit character sets used for the representation of various orthographies. In such cases the shell environment variable $LANG usually governs which ISO character set is in use.