The HTK label format is text based. As noted above, a single label file can contain multiple-alternatives and multiple-levels.
Each line of a HTK label file contains the actual label optionally preceded by start and end times, and optionally followed by a match score.
[start [end] ] name [score] { auxname [auxscore] } [comment]
where start denotes the start time of the labelled segment
in 100ns units, end
denotes the end time in 100ns units, name is the name
of the segment and score is a floating point confidence score.
All fields except the name are optional. If end is omitted then
it is set equal to -1 and ignored. This case would occur with data which had
been labelled frame synchronously. If start and end are both
missing then both are set to -1 and the label file is treated as a
simple symbolic transcription. The
optional score would typically be a log probability generated by a
recognition tool. When omitted the score is set to 0.0.
The following example corresponds to the transcription shown in part (a) of Fig. 6.1
0000000 3600000 ice
3600000 8200000 cream
Multiple levels are described by adding further names alongside
the basic name. The lowest level (shortest segments) should be
given first since only the lowest level has start and end times.
The label file corresponding to the transcription illustrated in
part (b) of Fig. 6.1 would be as follows.
0000000 2200000 ay ice
2200000 3600000 s
3600000 4300000 k cream
4300000 5000000 r
5000000 7400000 iy
7400000 8200000 m
Finally, multiple alternatives are written as a sequence of separate
label lists separated by three slashes (///).
The label file corresponding to the transcription illustrated in
part (c) of Fig. 6.1 would therefore be as follows.
0000000 2200000 I
2200000 8200000 scream
///
0000000 3600000 ice
3600000 8200000 cream
///
0000000 3600000 eyes
3600000 8200000 cream
Actual label names can be any sequence of characters. However, the - and + characters are reserved for identifying the left and right context , respectively, in a context-dependent phone label. For example, the label N-aa+V might be used to denote the phone aa when preceded by a nasal and followed by a vowel. These context-dependency conventions are used in the label editor HLED, and are understood by all HTK tools.