13.11.2 Network Definition

Next: 13.11.3 Compatibility Mode Up: 13.11 HParse Previous: 13.11.1 Function

13.11.2 Network Definition

The syntax rules for the textual definition of the network are as follows. Each node in the network has a nodename. This node name will normally correspond to a word in the final syntax network. Additionally, for use in compatibility mode, each node can also have an external name.

 
		        name 		 = 		 char{char}

		        nodename 		 = 		 name [ "%" ( "%" | name ) ]

Here char represents any character except one of the meta chars { } [ ] < >| = $ ( ) ; / *. The latter may, however, be escaped using a backslash. The first name in a nodename represents the name of the node (``internal name''), and the second optional name is the ``external'' name. This is used only in compatibility mode, and is, by default the same as the internal name.

Network definitions may also contain variables

 
		      variable 		 = 		 $name

Variables are identified by a leading $ character. They stand for sub-networks and must be defined before they appear in the RHS of a rule using the form

 
		      subnet 		 = 		 variable ``='' expr ``;''

An expr consists of a set of alternative sequences representing parallel branches of the network.

 
		      expr 		  = 		 sequence {``|'' sequence}

		      sequence 		 = 		 factor{factor}

Each sequence is composed of a sequence of factors where a factor is either a node name, a variable representing some sub-network or an expression contained within various sorts of brackets.

 
		   factor 		 = 		 ``('' expr ``)'' 		 |

						            ``{'' expr ``}'' 		 |

						            ``<'' expr ``>'' 		 |

						         ``['' expr ``]'' 		  |

						             ``<<'' expr ``>>'' 		 |

						               nodename 		 |

						               variable

Ordinary parentheses () denote simple factoring, curly braces { } denote zero or more repetitions and angle brackets <> denote one or more repetitions. Square brackets [] are used to enclose optional items. The double angle brackets are a special feature included for building context dependent loops and are explained further below. Finally, the complete network is defined by a list of sub-network definitions followed by a single expression within parentheses.

 
		    network 		 = 		 {subnet} ``('' expr ``)''

Note that C style comments may be placed anywhere in the text of the network definition.

As an example, the following network defines a syntax for some simple edit commands

   $dir   = up | down | left | right;
   $mvcmd = move $dir | top | bottom;      
   $item  = char | word | line | page;
   $dlcmd = delete [$item];   /* default is char */
   $incmd = insert;
   $encmd = end [insert];
   $cmd = $mvcmd|$dlcmd|$incmd|$encmd;
   ({sil} < $cmd {sil} > quit)

Double angle brackets are used to construct contextually consistent context-dependent loops such as a word-pair grammar. This function can also be used to generate consistent triphone loops for phone recognition. The entry and exit conditions to a context-dependent loop can be controlled by the invisible pseudo-words TLOOP_BEGIN and TLOOP_END. The right context of TLOOP_BEGIN defines the legal loop start nodes, and the left context of TLOOP_END defines the legal loop finishers. If TLOOP_BEGIN/TLOOP_END are not present then all models are connected to the entry/exit of the loop.

A word-pair grammar simply defines the legal set of words that can follow each word in the vocabulary. To generate a network to represent such a grammar a right context-dependent loop could be used. The legal sentence set of sentence start and end words are defined as above using TLOOP_BEGIN/TLOOP_END.

For example, the following lists the legal followers for each word in a 7 word vocabulary

ENTRY - show, tell, give show - me, all tell - me, all me - all all - names, addresses names - and, names, addresses, show, tell, EXIT addresses - and, names, addresses, show, tell, EXIT and - names, addresses, show, tell

HPARSE can generate a suitable lattice to represent this word-pair grammar by using the following specification:

   $TLOOP_BEGIN_FLLWRS = show|tell|give;
   $TLOOP_END_PREDS    = names|addresses;
   $show_FLLWRS        = me|all;
   $tell_FLLWRS        = me|all;
   $me_FLLWRS          = all;
   $all_FLLWRS         = names|addresses;
   $names_FLLWRS       = and|names|addresses|show|tell|TLOOP_END;
   $addresses_FLLWRS   = and|names|addresses|show|tell|TLOOP_END;
   $and_FLLWRS         = names|addresses|show|tell;
     
   ( sil << 
         TLOOP_BEGIN+TLOOP_BEGIN_FLLWRS |
         TLOOP_END_PREDS-TLOOP_END |
         show+show_FLLWRS |
         tell+tell_FLLWRS |
         me+me_FLLWRS |
         all+all_FLLWRS |
         names+names_FLLWRS |
         addresses+addresses_FLLWRS |
         and+and_FLLWRS 
     >> sil )

where it is assumed that each utterance begins and ends with sil model.

In this example, each set of contexts is defined by creating a variable whose alternatives are the individual contexts. The actual context-dependent loop is indicated by the << >> brackets. Each element in this loop is a single variable name of the form A-B+C where A represents the left context, C represents the right context and B is the actual word. Each of A, B and C can be nodenames or variable names but note that this is the only case where variable names are expanded automatically and the usual $ symbol is not used. Both A and C are optional, and left and right contexts can be mixed in the same triphone loop.

Next: 13.11.3 Compatibility Mode Up: 13.11 HParse Previous: 13.11.1 Function

ECRL HTK_V2.1: email [email protected]