Next: 6.5 Summary Up: 6 Transcriptions and Label Files Previous: 6.3.4 MLF Examples

6.4 Editing Label Files

HTK training tools typically expect the labels used in transcription files to correspond directly to the names of the HMMs chosen to build an application. Hence, the label files supplied with a speech database will often need modifying. For example, the original transcriptions attached to a database might be at a fine level of acoustic detail. Groups of labels corresponding to a sequence of acoustic events (e.g. pcl p') might need converting to some simpler form (e.g. p) which is more suitable for being represented by a HMM. As a second example, current high performance speech recognisers use a large number of context dependent models to allow more accurate acoustic modelling. For this case, the labels in the transcription must be converted to show the required contexts explicitly.

HTK supplies a tool called HLED for rapidly and efficiently converting label files. The HLED command invocation specifies the names of the files to be converted and the name of a script file holding the actual HLED commands. For example, the command

    HLEd edfile.led l1 l2 l3

would apply the edit commands stored in the file edfile.led to each of the label files l1, l2 and l3. More commonly the new label files are stored in a new directory to avoid overwriting the originals. This is done by using the -L option. For example,

    HLEd -L newlabs edfile.led l1 l2 l3

would have the same effect as previously except that the new label files would be stored in the directory newlabs.

Each edit command stored in an edit file is identified by a mnemonic consisting of two letters and must be stored on a separate line. The supplied edit commands can be divided into two groups. The first group consist of commands which perform selective changes to specific labels and the second group contains commands which perform global transformations. The reference section defines all of these commands. Here a few examples will be given to illustrate the use of HLED.

As a first example, when using the TIMIT database , the original 61 phoneme symbol set is often mapped into a simpler 48 phoneme symbol set. The aim of this mapping is to delete all glottal stops, replace all closures preceding a voiced stop by a generic voiced closure (vcl), all closures preceding an unvoiced stop by a generic unvoiced closure (cl) and the different types of silence to a single generic silence (sil). A HLED script to do this might be

    # Map 61 Phone Timit Set -> 48 Phones
    SO
    DE q
    RE cl pcl tcl kcl qcl
    RE vcl bcl dcl gcl
    RE sil h# #h pau

The first line is a comment indicated by the initial hash character. The command on the second line is the Sort command SO . This is an example of a global command. Its effect is to sort all the labels into time order. Normally the labels in a transcription will already be in time order but some speech editors simply output labels in the order that the transcriber marked them. Since this would confuse the re-estimation tools, it is good practice to explicitly sort all label files in this way.

The command on the third line is the Delete command DE . This is a selective command. Its effect is to delete all of the labels listed on the rest of the command line, wherever they occur. In this case, there is just one label listed for deletion, the glottal stop q. Hence, the overall effect of this command will be to delete all occurrences of the q label in the edited label files.

The remaining commands in this example script are Replace commands RE. The effect of a Replace command is to substitute the first label following the RE for every occurrence of the remaining labels on that line. Thus, for example, the command on the third line causes all occurrences of the labels pcl, tcl, kcl or qcl to be replaced by the label cl.

To illustrate the overall effect of the above HLED command script on a complete label file, the following TIMIT format label file

     0000 2241 h#
     2241 2715 w
     2715 4360 ow
     4360 5478 bcl
     5478 5643 b
     5643 6360 iy
     6360 7269 tcl
     7269 8313 t
     8313 11400 ay
    11400 12950 dcl
    12950 14360 dh
    14360 14640 h#

would be converted by the above script to the following

          0 1400625 sil 
    1400625 1696875 w 
    1696875 2725000 ow 
    2725000 3423750 vcl 
    3423750 3526875 b 
    3526875 3975000 iy 
    3975000 4543125 cl 
    4543125 5195625 t 
    5195625 7125000 ay 
    7125000 8093750 vcl 
    8093750 8975000 dh 
    8975000 9150000 sil

Notice that label boundaries in TIMIT format are given in terms of sample numbers (16kHz sample rate), whereas the edited output file is in HTK format in which all times are in absolute 100ns units.

As well as the Replace command, there is also a Merge command ME . This command is used to replace a sequence of labels by a single label. For example, the following commands would merge the closure and release labels in the previous TIMIT transcription into single labels

    ME b bcl b
    ME d dcl dh
    ME t tcl t

As shown by this example, the label used for the merged sequence can be the same as occurs in the original but some care is needed since HLED commands are normally applied in sequence. Thus, a command on line n is applied to the label sequence that remains after the commands on lines 1 to n-1 have been applied.

There is one exception to the above rule of sequential edit command application. The Change command CH provides for context sensitive replacement. However, when a sequence of Change commands occur in a script, the sequence is applied as a block so that the contexts which apply for each command are those that existed just prior to the block being executed. The Change command takes 4 arguments X A Y B such that every occurrence of label Y in the context of A _ B is changed to the label X. The contexts A and B refer to sets of labels and are defined by separate Define Context commands DC . The CH and DC commands are primarily used for creating context sensitive labels. For example, suppose that a set of context-dependent phoneme models are needed for TIMIT. Rather than treat all possible contexts separately and build separate triphones for each (see below), the possible contexts will be grouped into just 5 broad classes: C (consonant), V (vowel), N (nasal), L (liquid) and S (silence). The goal then is to translate a label sequence such as sil b ah t iy n ... into sil+C S-b+V C-ah+C V-t+V C-iy+N V-n+ ... where the - and + symbols within a label are recognised by HTK as defining the left and right context, respectively. To perform this transformation, it is necessary to firstly use DC commands to define the 5 contexts, that is

    DC V iy ah ae eh ix ... 
    DC C t k d k g dh ... 
    DC L l r w j ...
    DC N n m ng ...
    DC S h# #h epi ...

Having defined the required contexts, a change command must be written for each context dependent triphone, that is

    CH V-ah+V V ah V
    CH V-ah+C V ah C
    CH V-ah+N V ah N
    CH V-ah+L V ah L
     ...
     etc

This script will, of course, be rather long (25

number of phonemes) but it can easily be generated automaticaly by a simple program or shell script.

The previous example shows how to transform a set of phonemes into a context dependent set in which the contexts are user-defined. For convenience, HLED provides a set of global transformation commands for converting phonemic transcriptions to conventional left or right biphones, or full triphones. For example, a script containing the single Triphone Conversion command TC will convert phoneme files to regular triphones. As an illustration, applying the TC command to a file containing the sequence sil b ah t iy n ... would give the transformed sequence sil+b sil-b+ah b-ah+t ah-t+iy t-iy+n iy-n+ .... Notice that the first and last phonemes in the sequence cannot be transformed in the normal way. Hence, the left-most and right-most contexts of these start and end phonemes can be specified explicitly as arguments to the TC commands if required. For example, the command TC # # would give the sequence #-sil+b sil-b+ah b-ah+t ah-t+iy t-iy+n iy-n+ ... +#. Also, the contexts at pauses and word boundaries can be blocked using the WB command. For example, if WB sp was executed, the effect of a subsequent TC command on the sequence sil b ah t sp iy n ... would be to give the sequence sil+b sil-b+ah b-ah+t ah-t sp iy+n iy-n+ ..., where sp represents a short pause. Conversely, the NB command command can be used to ignore a label as far as context is concerned. For example, if NB sp was executed, the effect of a subsequent TC command on the sequence sil b ah t sp iy n ... would be to give the sequence sil+b sil-b+ah b-ah+t ah-t+iy sp t-iy+n iy-n+ ....

When processing HTK format label files with multiple levels, only the level 1 (i.e. left-most) labels are affected. To process a higher level, the Move Level command ML should be used. For example, in the script

    ML 2
    RE one 1
    RE two 2
    ...

the Replace commands are applied to level 2 which is the first level above the basic level. The command ML 1 returns to the base level. A complete level can be deleted by the Delete Level command DL. This command can be given a numeric argument to delete a specific level or with no argument, the current level is deleted. Multiple levels can also be split into single level alternatives by using the Split Level command SL.

When processing HTK format files with multiple alternatives, each alternative is processed as though it were a separate file.

Remember also that in addition to the explicit HLED commands, levels and alternatives can be filtered on input by setting the configuration variables TRANSLEV and TRANSALT (see section 6.1).

Finally, it should be noted that most HTK tools require all HMMs used in a system to be defined in a HMM List. HLED can be made to automatically generate such a list as a by-product of editing the label files by using the -n option. For example, the following command would apply the script timit.led to all files in the directory tlabs, write the converted files to the directory hlabs and also write out a list of all new labels in the edited files to tlist.

    HLEd -n tlist -L hlabs -G TIMIT timit.led tlabs/*

Notice here that the -G option is used to inform HLED that the format of the source files is TIMIT. This could also be indicated by setting the configuration variable SOURCELABEL .

Next: 6.5 Summary Up: 6 Transcriptions and Label Files Previous: 6.3.4 MLF Examples

ECRL HTK_V2.1: email [email protected]