next up previous contents index
Next: 11.7 Constructing a Dictionary Up: 11 NetworksDictionaries and Language Models Previous: 11.5 Building a Word Network with HBUILD

11.6 Testing a Word Network using HSGEN

 

When designing task grammars, it is useful to be able to check that the language defined by the final word network is as envisaged. One simple way to check this is to use the network as a generator by randomly traversing it and outputting the name of each word node encountered. HTK provides a very simple tool called HSGEN  for doing this.

As an example if the file bnet contained the simple Bit-But netword described above and the file bdic contained a corresponding dictionary then the command

    HSGen bnet bdic
would generate a random list of examples of the language defined by bnet, for example,
    start bit but bit bit bit end 
    start but bit but but end 
    start bit bit but but end 
    .... etc
This is perhaps not too informative in this case but for more complex grammars, this type of output can be quite illuminating.

HSGEN will also estimate the empirical entropy by recording the probability of each sentence generated . To use this facility, it is best to suppress the sentence output and generate a large number of examples. For example, executing

    HSGen -s -n 1000 -q bnet bdic
where the -s option requests statistics, the -q option suppresses the output and -n 1000 asks for 1000 sentences would generate the following output
    Number of Nodes = 4 [0 null], Vocab Size = 4
    Entropy = 1.156462,  Perplexity = 2.229102
    1000 Sentences: average len = 5.1, min=3, max=19


next up previous contents index
Next: 11.7 Constructing a Dictionary Up: 11 NetworksDictionaries and Language Models Previous: 11.5 Building a Word Network with HBUILD

ECRL HTK_V2.1: email [email protected]