next up previous contents index
Next: 6.2.2 ESPS Label Files Up: 6.2 Label File Formats Previous: 6.2 Label File Formats

6.2.1 HTK Label Files

The HTK label format is text based. As noted above, a single label file can contain multiple-alternatives and multiple-levels.

Each line of a HTK label file contains  the actual label optionally preceded by start and end times, and optionally followed by a match score.

    [start  [end] ] name [score] { auxname [auxscore] } [comment]
where start denotes the start time of the labelled segment in 100ns units, end denotes the end time in 100ns units, name is the name of the segment and score is a floating point confidence score. All fields except the name are optional. If end is omitted then it is set equal to -1 and ignored. This case would occur with data which had been labelled frame synchronously. If start and end are both missing then both are set to -1 and the label file is treated as a simple symbolic transcription. The optional score would typically be a log probability generated by a recognition tool. When omitted the score is set to 0.0.

The following example corresponds to the transcription shown in part (a) of Fig. 6.1

    0000000 3600000 ice
    3600000 8200000 cream
Multiple levels are described by adding further names alongside the basic name. The lowest level (shortest segments) should be given first since only the lowest level has start and end times. The label file corresponding to the transcription illustrated in part (b) of Fig. 6.1 would be as follows.
    0000000 2200000 ay     ice
    2200000 3600000 s
    3600000 4300000 k      cream
    4300000 5000000 r
    5000000 7400000 iy
    7400000 8200000 m
Finally, multiple alternatives are written as a sequence of separate label lists separated by three slashes (///). The label file corresponding to the transcription illustrated in part (c) of Fig. 6.1 would therefore be as follows.
    0000000 2200000 I
    2200000 8200000 scream
    ///
    0000000 3600000 ice
    3600000 8200000 cream
    ///
    0000000 3600000 eyes
    3600000 8200000 cream

Actual label names can be any sequence of characters. However, the - and + characters are reserved for identifying the left and right context , respectively, in a context-dependent phone label. For example, the label N-aa+V might be used to denote the phone aa when preceded by a nasal and followed by a vowel. These context-dependency conventions are used in the label editor HLED, and are understood by all HTK tools.


next up previous contents index
Next: 6.2.2 ESPS Label Files Up: 6.2 Label File Formats Previous: 6.2 Label File Formats

ECRL HTK_V2.1: email [email protected]