Lab 4 --- Large Vocabulary Decoding: A Love Story
Prev		Next

2. Part 1: Playing With FSM's and the IBM FSM Toolkit

In this part, we introduce the IBM FSM toolkit as a first step in learning more about the static graph expansion process. In particular, you will be using the program FsmOp, which is a utility that can perform a variety of finite-state operations on weighted FSA's and FST's. The program FsmOp is like a calculator that operates on FSM's rather than real values, where arguments are input using [reverse Polish notation] as is used in some [HP calculators]. For example, to produce the composition of an FSA held in the file foo.fsm and an FST held in bar.fsm, you would use the command
FsmOp foo.fsm bar.fsm -compose > result.fsm
Operations begin with the “-” character, and include -compose, -determinize, and -minimize among many others. By default, the resulting FSM is written to standard output, but if the last argument supplied is a filename (rather than an operation), the resulting FSM will be written to that file instead. Thus, the command
FsmOp foo.fsm bar.fsm -compose result.fsm
has the same effect as the last example.

We explain the default FSA format through an example:
1 2 foo 1 2 bar 2 3 moo 3
Each line with three fields describes an arc in the FSM; the format is: src-state dst-state label. States need not be numbered starting from 1; the label <epsilon> is used to represent the empty label. Each line with a single field lists a final state. The first state mentioned in the file is the start state. Thus, the above FSA file corresponds to:

We drew the above Postscript diagram using the following command:

FsmOp foo.fsm -draw | dot -Tps > foo.ps

You can use similar commands to help you visualize FSM's.

For weighted FSA's, each line can optionally be followed with a cost, or negative log probability base 10. If a cost is omitted on a line, it is taken to be zero. For example, here is a weighted FSA:
1 2 foo 1 2 bar 2.4 2 3 <epsilon> 2 1.2 3
corresponding to

Finite-state transducers have a similar format, except lines representing arcs have an extra field:
src-state dst-state in-label out-label [optional-cost]
In addition, to signal that a file holds an FST rather than an FSA, the following line should be included at the start of the file:
# transducer: true
Here is an example FST:
# transducer: true 1 2 ax AX 1 2 bar BAR 1.0 2 3 moo <epsilon> 2 1.2 3

For this part of the lab, you will have to create various FSM's and perform operations on them, as described in lab4.txt. Here are some hints:

Don't forget to add final states to your FSM's! Without these, your FSM's will be equivalent to empty FSM's.
For transducers, don't forget the line “# transducer: true”!
For a list of all of the operations that FsmOp can perform, run FsmOp with no arguments.

To prepare for this part, create the relevant subdirectory and copy over the needed files:

mkdir -p ~/e6884/lab4/
cd ~/e6884/lab4/
cp ~stanchen/e6884/lab4/copy/* .
cp ~stanchen/e6884/lab4/.mk_chain .

This will also copy over all files needed in later parts of the lab.

Prev	Home	Next
Overview		Part 2: Investigating the Static Graph Expansion Process