Now that word networks and dictionaries have been explained, the conversion of word level networks to model-based recognition networks will be described. Referring again to Fig 11.1, this expansion is performed automatically by the module HNET. By default, HNET attempts to infer the required expansion from the contents of the dictionary and the associated list of HMMs. However, 5 configurations parameters are supplied to apply more precise control where required: ALLOWCXTEXP , ALLOWXWRDEXP , FORCECXTEXP , FORCELEFTBI and FORCERIGHTBI .
The expansion proceeds in four stages.
The determination of the network type can be modified by using the configuration parameters mentioned earlier. By default ALLOWCXTEXP is set true. If ALLOWCXTEXP is set false, then no expansion of phone names is performed and each phone corresponds to the model of the same name. The default value of ALLOWXWRDEXP is false thus preventing context expansion across word boundaries. This also limits the expansion of the phone labels in the dictionary to word internal contexts only. If FORCECXTEXP is set true, then context expansion will be performed. For example, if the HMM set contained all monophones, all biphones and all triphones, then given a monophone dictionary, the default behaviour of HNET would be to generate a monophone recognition network since the dictionary would be closed. However, if FORCECXTEXP is set true and ALLOWXWRDEXP is set false then word internal context expansion will be performed. If FORCECXTEXP is set true and ALLOWXWRDEXP is set true then full cross-word context expansion will be performed.
sil aa r sp y uw sp silwould be expanded as
sil sil-aa+r aa-r+y sp r-y+uw y-uw+sil sp silassuming that sil is context-independent and sp is context-free. For word-internal systems, the context expansion can be further controlled via the configuration variable CFWORDBOUNDARY. When set true (default setting) context-free phones will be treated as word boundaries so
aa r sp y uw spwould be expanded to
aa+r aa-r sp y+uw y-uw spSetting CFWORDBOUNDARY false would produce
aa+r aa-r+y sp r-y+uw y-uw sp
Having described the expansion process in some detail, some simple examples will help clarify the process. All of these are based on the Bit-But word network illustrated in Fig. 11.2. Firstly, assume that the dictionary contains simple monophone pronunciations, that is
bit b i t
but b u t
start sil
end sil
and the HMM set consists of just monophones
b i t u silIn this case, HNET will find a closed dictionary. There will be no expansion and it will directly generate the network shown in Fig 11.8. In this figure, the rounded boxes represent model nodes and the square boxes represent word-end nodes.
Similarly, if the dictionary contained word-internal triphone pronunciations such as
bit b+i b-i+t i-t
but b+u b-u+t u-t
start sil
end sil
and the HMM set contains all the required models
b+i b-i+t i-t b+u b-u+t u-t silthen again HNET will find a closed dictionary and the network shown in Fig. 11.9 would be generated.
If however the dictionary contained just the simple monophone pronunciations as in the first case above, but the HMM set contained just triphones, that is
sil-b+i t-b+i b-i+t i-t+sil i-t+b
sil-b+u t-b+u b-u+t u-t+sil u-t+b sil
then HNET would perform full cross-word expansion and
generate the network shown in Fig. 11.10.
Now suppose that still using the simple monophone pronunciations, the HMM set contained all monophones, biphones and triphones. In this case, the default would be to generate the monophone network of Fig 11.8. If FORCECXTEXP is true but ALLOWXWRDEXP is set false then the word-internal network of Fig. 11.9 would be generated. Finally, if both FORCECXTEXP and ALLOWXWRDEXP are set true then the cross-word network of Fig. 11.10 would be generated.