The syntax rules for the textual definition of the network are
as follows. Each node in the network has a nodename.
This node name will normally correspond to a word in the final syntax
network. Additionally, for use in compatibility mode,
each node can also have an external name.
nodename = name [ "%" ( "%" | name ) ]
name = char{char}
Here char represents any character except one of the meta chars
{ } [ ] < >| = $ ( ) ;
/ *. The latter may,
however, be escaped using a backslash. The first name in a nodename
represents the name of the node (``internal name''), and the second optional name is
the ``external'' name. This is used only in compatibility mode, and is, by default
the same as the internal name.
Network definitions may also contain variables
variable = $name
Variables are identified by a leading $ character. They stand for
sub-networks and must be defined before they appear in the RHS of a rule
using the form
subnet = variable ``='' expr ``;''
An expr consists of a set of alternative sequences representing
parallel branches of the network.
sequence = factor{factor}
expr = sequence {``|'' sequence}
Each sequence is composed of a sequence of factors where a factor is either a node name, a variable representing some sub-network or an expression contained within various sorts of brackets.
``{'' expr ``}'' |
``<'' expr ``>'' |
``['' expr ``]'' |
``<<'' expr ``>>'' |
nodename |
variable
factor = ``('' expr ``)'' |
Ordinary parentheses () denote simple factoring, curly braces { }
denote zero or more repetitions and angle brackets <> denote one
or more repetitions. Square brackets [] are used to enclose optional
items. The double angle brackets are a special feature included
for building context dependent loops and are explained further below.
Finally, the complete network is defined by a list of sub-network
definitions followed by a single expression within parentheses.
network = {subnet} ``('' expr ``)''
Note that C style comments may be placed anywhere in the text of the network definition.
As an example, the following network defines a syntax for some simple edit commands
$dir = up | down | left | right;
$mvcmd = move $dir | top | bottom;
$item = char | word | line | page;
$dlcmd = delete [$item]; /* default is char */
$incmd = insert;
$encmd = end [insert];
$cmd = $mvcmd|$dlcmd|$incmd|$encmd;
({sil} < $cmd {sil} > quit)
Double angle brackets are used to
construct contextually consistent context-dependent loops such
as a word-pair grammar.
This function can also be used to generate consistent triphone loops
for phone recognition
.
The entry and exit conditions to a
context-dependent loop can be controlled by the invisible
pseudo-words TLOOP_BEGIN and TLOOP_END. The right context of TLOOP_BEGIN
defines the legal loop start nodes, and the left context of TLOOP_END
defines the legal loop finishers. If TLOOP_BEGIN/TLOOP_END are not
present then all models are connected to the entry/exit of the loop.
A word-pair grammar simply defines the legal set of words that can follow each word in the vocabulary. To generate a network to represent such a grammar a right context-dependent loop could be used. The legal sentence set of sentence start and end words are defined as above using TLOOP_BEGIN/TLOOP_END.
For example, the following lists the legal followers for each word in a 7 word vocabulary
ENTRY - show, tell, giveshow - me, all
tell - me, all
me - all
all - names, addresses
names - and, names, addresses, show, tell, EXIT
addresses - and, names, addresses, show, tell, EXIT
and - names, addresses, show, tell
HPARSE can generate a suitable lattice to represent this word-pair grammar by using the following specification:
$TLOOP_BEGIN_FLLWRS = show|tell|give;
$TLOOP_END_PREDS = names|addresses;
$show_FLLWRS = me|all;
$tell_FLLWRS = me|all;
$me_FLLWRS = all;
$all_FLLWRS = names|addresses;
$names_FLLWRS = and|names|addresses|show|tell|TLOOP_END;
$addresses_FLLWRS = and|names|addresses|show|tell|TLOOP_END;
$and_FLLWRS = names|addresses|show|tell;
( sil <<
TLOOP_BEGIN+TLOOP_BEGIN_FLLWRS |
TLOOP_END_PREDS-TLOOP_END |
show+show_FLLWRS |
tell+tell_FLLWRS |
me+me_FLLWRS |
all+all_FLLWRS |
names+names_FLLWRS |
addresses+addresses_FLLWRS |
and+and_FLLWRS
>> sil )
where it is assumed that each utterance begins and ends with sil
model.
In this example, each set of contexts is defined by creating a variable
whose alternatives are the individual contexts. The actual context-dependent
loop is indicated by the << >> brackets.
Each element in this loop is a single
variable name of the form A-B+C where A represents the left
context, C represents the right context and B is the actual word.
Each of A, B and C can be nodenames or
variable names but note that this is the only case where variable
names are expanded automatically and the usual
$ symbol is not used
. Both A and C are optional, and left and
right contexts can be mixed in the same triphone loop.