Next: 12 Decoding Up: 11 NetworksDictionaries and Language Models Previous: 11.8 Word Network Expansion

11.9 Other Kinds of Recognition System

Although the recognition facilities of HTK are aimed primarily at sub-word based connected word recognition, it can nevertheless support a variety of other types of recognition system.

To build a phoneme recogniser, a word-level network is defined using an SLF file in the usual way except that each ``word'' in the network represents a single phone. The structure of the network will typically be a loop in which all phones loop back to each other.

The dictionary then contains an entry for each ``word'' such that the word and the pronunciation are the same, for example, the dictionary might contain

    ih ih
    eh eh
    ah ah
    ... etc

Phoneme recognisers often use biphones to provide some measure of context-dependency. Provided that the HMM set contains all the necessary biphones, then HNET will expand a simple phone loop into a context-sensitive biphone loop simply by setting the configuration variable FORCELEFTBI or FORCERIGHTBI to true, as appropriate.

Whole word recognisers can be set-up in a similar way. The word network is designed using the same considerations as for a sub-word based system but the dictionary gives the name of the whole-word HMM in place of each word pronunciation.

Finally, word spotting systems can be defined by placing each keyword in a word network in parallel with the appropriate filler models. The keywords can be whole-word models or subword based. Note in this case that word transition penalties placed on the transitions can be used to gain fine control over the false alarm rate.

Next: 12 Decoding Up: 11 NetworksDictionaries and Language Models Previous: 11.8 Word Network Expansion

ECRL HTK_V2.1: email [email protected]