13.14.1 Function

Next: 13.14.2 Use Up: 13.14 HResults Previous: 13.14 HResults

13.14.1 Function

HRESULTS is the HTK performance analysis tool. It reads in a set of label files (typically output from a recognition tool such as HVITE) and compares them with the corresponding reference transcription files. For the analysis of speech recognition output, the comparison is based on a Dynamic Programming-based string alignment procedure. For the analysis of word-spotting output, the comparison uses the standard US NIST FOM metric.

When used to calculate the sentence accuracy using DP the basic output is recognition statistics for the whole file set in the format

   --------------------------- Overall Results -------------------
   SENT:  %Correct=13.00 [H=13, S=87, N=100]
   WORD:  %Corr=53.36, Acc=44.90 [H=460,D=49,S=353,I=73,N=862]
   ===============================================================

The first line gives the sentence-level accuracy based on the total number of label files which are identical to the transcription files. The second line is the word accuracy based on the DP matches between the label files and the transcriptions

. In this second line, H is the number of correct labels, D is the number of deletions, S is the number of substitutions, I is the number of insertions and N is the total number of labels in the defining transcription files. The percentage number of labels correctly recognised is given by

and the accuracy is computed by

In addition to the standard HTK output format, HRESULTS provides an alternative similar to that used in the US NIST scoring package, i.e.

    |=============================================================|
    |           # Snt |  Corr    Sub    Del    Ins    Err  S. Err |
    |-------------------------------------------------------------|
    | Sum/Avg |   87  |  53.36  40.95   5.68   8.47  55.10  87.00 |
    `-------------------------------------------------------------'

Optional extra outputs available from HRESULTS are

recognition statistics on a per file basis
recognition statistics on a per speaker basis
recognition statistics from best of N alternatives
time-aligned transcriptions
confusion matrices

For comparison purposes, it is also possible to assign two labels to the same equivalence class (see -e option). Also, the null label ??? is defined so that making any label equivalent to the null label means that it will be ignored in the matching process. Note that the order of equivalence labels is important, to ensure that label X is ignored, the command line option -e ??? X would be used. Label files containing triphone labels of the form A-B+C can be optionally stripped down to just the class name B via the -s switch.

The word spotting mode of scoring can be used to calculate hits, false alarms and the associated figure of merit for each of a set of keywords. Optionally it can also calculate ROC information over a range of false alarm rates. A typical output is as follows

------------------------ Figures of Merit -------------------------
      KeyWord:    #Hits     #FAs  #Actual      FOM
            A:        8        1       14    30.54
            B:        4        2       14    15.27
      Overall:       12        3       28    22.91
-------------------------------------------------------------------

which shows the number of hits and false alarms (FA) for two keywords A and B. A label in the test file with start time

and end time

constitutes a hit if there is a corresponding label in the reference file such that

where

is the mid-point of the reference label.

Note that for keyword scoring, the test transcriptions must include a score with each labelled word spot and all transcriptions must include boundary time information.

The FOM gives the % of hits averaged over the range 1 to 10 FA's per hour. This is calculated by first ordering all spots for a particular keyword according to the match score. Then for each FA rate f, the number of hits are counted starting from the top of the ordered list and stopping when f have been encountered. This corresponds to a posteriori setting of the keyword detection threshold and effectively gives an upper bound on keyword spotting performance.

Next: 13.14.2 Use Up: 13.14 HResults Previous: 13.14 HResults

ECRL HTK_V2.1: email [email protected]