Next: 11.4 Bigram Language Models Up: 11 NetworksDictionaries and Language Models Previous: 11.2 Word Networks and Standard Lattice Format

11.3 Building a Word Network with HPARSE

Whilst the construction of a word level SLF network file by hand is not difficult, it can be somewhat tedious. In earlier versions of HTK, a high level grammar notation based on extended Backus-Naur Form (EBNF ) was used to specify recognition grammars. This HParse format was read-in directly by the recogniser and compiled into a finite state recognition network at run-time.

In HTK V2.0 , HParse format is still supported but in the form of an off-line compilation into an SLF word network which can subsequently be used to drive a recogniser.

A HParse format grammar consists of an extended form of regular expression enclosed within parentheses. Expressions are constructed from sequences of words and the metacharacters

|: denotes alternatives
[ ]: encloses options
{ }: denotes zero or more repetitions
< >: denotes one or more repetitions
<< >>: denotes context-sensitive loop

The following examples will illustrate the use of all of these except the last which is a special-purpose facility provided for constructing context-sensitive loops as found in for example, context-dependent phone loops and word-pair grammars. It is described in the reference entry for HPARSE .

As a first example, suppose that a simple isolated word single digit recogniser was required. A suitable syntax would be

     (
        one | two | three | four | five |
        six | seven | eight | nine | zero
     )

This would translate into the network shown in part (a) of Fig. 11.4. If this HParse format syntax definition was stored in a file called digitsyn, the equivalent SLF word network would be generated in the file digitnet by typing

     HParse digitsyn digitnet

The above digit syntax assumes that each input digit is properly end-pointed. This requirement can be removed by adding a silence model before and after the digit

     (
        sil (one | two | three | four | five |
        six | seven | eight | nine | zero) sil
     )

As shown by graph (b) in Fig. 11.4, the allowable sequence of models now consists of silence followed by a digit followed by silence. If a sequence of digits needed to be recognised then angle brackets can be used to indicate one or more repetitions, the HParse grammar

     (
        sil < one | two | three | four | five |
        six | seven | eight | nine | zero > sil
     )

would accomplish this. Part (c) of Fig. 11.4 shows the network that would result in this case.

tex2html_wrap22030

HParse grammars can define variables to represent sub-expressions. Variable names start with a dollar symbol and they are given values by definitions of the form

   $var = expression ;

For example, the above connected digit grammar could be rewritten as

     $digit = one | two | three | four | five |
              six | seven | eight | nine | zero;
     (
        sil < $digit > sil
     )

Here $digit is a variable whose value is the expression appearing on the right hand side of the assignment. Whenever the name of a variable appears within an expression, the corresponding expression is substituted. Note however that variables must be defined before use, hence, recursion is prohibited.

As a final refinement of the digit grammar, the start and end silence can be made optional by enclosing them within square brackets thus

     $digit = one | two | three | four | five |
              six | seven | eight | nine | zero;
     (
        [sil] < $digit > [sil]
     )

Part (d) of Fig. 11.4 shows the network that would result in this last case.

HParse format grammars are a convenient way of specifying task grammars for interactive voice interfaces. As a final example, the following defines a simple grammar for the control of a telephone by voice.

     $digit  = one | two | three | four | five |
               six | seven | eight | nine | zero;
     $number = $digit { [pause] $digit};
     $scode  = shortcode $digit $digit;
     $telnum = $scode | $number;
     $cmd    = dial $telnum | 
               enter $scode for $number |
               redial | cancel;
     $noise  = lipsmack | breath | background;
     ( < $cmd | $noise > )

The dictionary entries for pause, lipsmack, breath and background would reference HMMs trained to model these types of noise and the corresponding output symbols in the dictionary would be null.

Finally, it should be noted that when the HParse format was used in earlier versions of HTK, word grammars contained word pronunciations embedded within them. This was done by using the reserved node names WD_BEGIN and WD_END to delimit word boundaries. To provide backwards compatiblity, HPARSE can process these old format networks but when doing so it outputs a dictionary as well as a word network. This compatibility mode is defined fully in the reference section, to use it the configuration variable V1COMPAT must be set true or the -c option set.

Finally on the topic of word networks , it is important to note that any network containing an unbroken loop of one or more tee-models will generate an error. For example, if sp is a single state tee-model used to represent short pauses, then the following network would generate an error

    ( sil < sp | $digit > sil )

the intention here is to recognise a sequence of digits which may optionally be separated by short pauses. However, the syntax allows an endless sequence of sp models and hence, the recogniser could traverse this sequence without ever consuming any input. The solution to problems such as these is to rearrange the network. For example, the above could be written as

    ( sil < $digit sp > sil )

Next: 11.4 Bigram Language Models Up: 11 NetworksDictionaries and Language Models Previous: 11.2 Word Networks and Standard Lattice Format

ECRL HTK_V2.1: email [email protected]