The goal of the system to be built here is to provide a voice-operated interface for phone dialling. Thus, the recogniser must handle digit strings and also personal name lists. Examples of typical inputs might be
Dial three three two six five fourDial nine zero four one oh nine
Phone Woodland
Call Steve Young
HTK provides a grammar definition language for specifying simple task grammars such as this. It consists of a set of variable definitions followed by a regular expression describing the words to recognise. For the voice dialling application, a suitable grammar might be
$digit = ONE | TWO | THREE | FOUR | FIVE | SIX | SEVEN | EIGHT | NINE | OH | ZERO; $name = [ JOOP ] JANSEN | [ JULIAN ] ODELL | [ DAVE ] OLLASON | [ PHIL ] WOODLAND | [ STEVE ] YOUNG; ( SENT-START ( DIAL <$digit> | (PHONE|CALL) $name) SENT-END )where the vertical bars denote alternatives, the square brackets denote optional items and the angle braces denote one or more repetitions. The complete grammar can be depicted as a network as shown in Fig. 3.1.
The above high level representation of a task grammar is provided for user convenience. The HTK recogniser actually requires a word network to be defined using a low level notation called HTK Standard Lattice Format (SLF) in which each word instance and each word-to-word transition is listed explicitly. This word network can be created automatically from the grammar above using the HPARSE tool, thus assuming that the file gram contains the above grammar, executing
HParse gram wdnetwill create an equivalent word network in the file wdnet (see Fig 3.2).