When designing task grammars, it is useful to be able to check that the language defined by the final word network is as envisaged. One simple way to check this is to use the network as a generator by randomly traversing it and outputting the name of each word node encountered. HTK provides a very simple tool called HSGEN for doing this.
As an example if the file bnet contained the simple Bit-But netword described above and the file bdic contained a corresponding dictionary then the command
HSGen bnet bdicwould generate a random list of examples of the language defined by bnet, for example,
start bit but bit bit bit end start but bit but but end start bit bit but but end .... etcThis is perhaps not too informative in this case but for more complex grammars, this type of output can be quite illuminating.
HSGEN will also estimate the empirical entropy by recording the probability of each sentence generated . To use this facility, it is best to suppress the sentence output and generate a large number of examples. For example, executing
HSGen -s -n 1000 -q bnet bdicwhere the -s option requests statistics, the -q option suppresses the output and -n 1000 asks for 1000 sentences would generate the following output
Number of Nodes = 4 [0 null], Vocab Size = 4 Entropy = 1.156462, Perplexity = 2.229102 1000 Sentences: average len = 5.1, min=3, max=19