This section summarises the various file formats, parameter kinds, qualifiers and configuration parameters used by HTK. Table 5.1 lists the audio speech file formats which can be read by the HWAVE module. Table 5.2 lists the basic parameter kinds supported by the HPARM module and Fig. 5.8 shows the various automatic conversions that can be performed by appropriate choice of source and target parameter kinds. Table 5.3 lists the available qualifiers for parameter kinds. The first 6 of these are used to describe the target kind. The source kind may already have some of these, HPARM adds the rest as needed. Note that HPARM can also delete qualifiers when converting from source to target. The final two qualifiers in Table 5.3 are only used in external files to indicate compression and an attached checksum. HPARM adds these qualifiers to the target form during output and only in response to setting the configuration parameters SAVECOMPRESSED and SAVEWITHCRC. Adding the _C or _K qualifiers to the target kind simply causes an error. Finally, Tables 5.4 and 5.5 lists all of the configuration parameters along with their meaning and default values.
Name | Description |
HTK | The standard HTK file format |
TIMIT | As used in the original prototype TIMIT CD-ROM |
NIST | The standard SPHERE format used by the US NIST |
SCRIBE | Subset of the European SAM standard used in the SCRIBE CD-ROM |
SDES1 | The Sound Designer 1 format defined by Digidesign Inc. |
AIFF | Audio interchange file format |
SUNAU8 | Subset of 8bit ".au" and ".snd" formats used by Sun and NeXT |
OGI | Format used by Oregan Graduate Institute similar to TIMIT |
WAVE | Microsoft WAVE files used on PCs |
ESIG | Entropic Esignal file format |
AUDIO | Pseudo format to indicate direct audio input |
ALIEN | Pseudo format to indicate unsupported file, the alien header size must be set via the environment variable HDSIZE |
NOHEAD | As for the ALIEN format but header size is zero |
Kind | Meaning |
WAVEFORM | scalar samples (usually raw speech data) |
LPC | linear prediction coefficients |
LPREFC | linear prediction reflection coefficients |
LPCEPSTRA | LP derived cepstral coefficients |
LPDELCEP | LP cepstra + delta coef (obsolete) |
IREFC | LPREFC stored as 16bit (short) integers |
MFCC | mel-frequency cepstral coefficients |
FBANK | log filter-bank parameters |
MELSPEC | linear filter-bank parameters |
USER | user defined parameters |
DISCRETE | vector quantised codebook symbols |
ANON | matches actual parameter kind |
Qualifier | Meaning |
_A | Acceleration coefficients appended |
_C | External form is compressed |
_D | Delta coefficients appended |
_E | Log energy appended |
_K | External form has checksum appended |
_N | Absolute log energy suppressed |
_V | VQ index appended |
_Z | Cepstral mean subtracted |
_0 | Cepstral C0 coefficient appended |
Module | Name | Default | Description |
HAUDIO | LINEIN | T | Select line input for audio |
HAUDIO | MICIN | F | Select microphone input for audio |
HAUDIO | LINEOUT | T | Select line output for audio |
HAUDIO | SPEAKEROUT | F | Select speaker output for audio |
HAUDIO | PHONESOUT | T | Select headphones output for audio |
SOURCEKIND | ANON | Parameter kind of source | |
SOURCEFORMAT | HTK | File format of source | |
SOURCERATE | 0.0 | Sample period of source in 100ns units | |
HWAVE | NSAMPLES | Num samples in alien file input via a pipe | |
HWAVE | HEADERSIZE | Size of header in an alien file | |
HWAVE | STEREOMODE | Select channel: RIGHT or LEFT | |
HWAVE | BYTEORDER | Define byte order VAX or other | |
NATURALREADORDER | F | Enable natural read order for HTK files | |
NATURALWRITEORDER | F | Enable natural write order for HTK files | |
TARGETKIND | ANON | Parameter kind of target | |
TARGETFORMAT | HTK | File format of target | |
TARGETRATE | 0.0 | Sample period of target in 100ns units | |
HPARM | SAVECOMPRESSED | F | Save the output file in compressed form |
HPARM | SAVEWITHCRC | T | Attach a checksum to output parameter file |
HPARM | ADDDITHER | 0.0 | Level of noise added to input signal |
HPARM | ZMEANSOURCE | F | Zero mean source waveform before analysis |
HPARM | WINDOWSIZE | 256000.0 | Analysis window size in 100ns units |
HPARM | USEHAMMING | T | Use a Hamming window |
HPARM | PREEMCOEF | 0.97 | Set pre-emphasis coefficient |
HPARM | LPCORDER | 12 | Order of LPC analysis |
HPARM | NUMCHANS | 20 | Number of filterbank channels |
HPARM | LOFREQ | -1.0 | Low frequency cut-off in fbank analysis |
HPARM | HIFREQ | -1.0 | High frequency cut-off in fbank analysis |
HPARM | USEPOWER | F | Use power not magnitude in fbank analysis |
HPARM | NUMCEPS | 12 | Number of cepstral parameters |
HPARM | CEPLIFTER | 22 | Cepstral liftering coefficient |
HPARM | ENORMALISE | T | Normalise log energy |
HPARM | ESCALE | 0.1 | Scale log energy |
HPARM | SILFLOOR | 50.0 | Energy silence floor (dB) |
HPARM | DELTAWINDOW | 2 | Delta window size |
HPARM | ACCWINDOW | 2 | Acceleration window size |
HPARM | VQTABLE | NULL | Name of VQ table |
HPARM | SAVEASVQ | F | Save only the VQ indices |
HPARM | AUDIOSIG | 0 | Audio signal number for remote control |
Module | Name | Default | Description |
HPARM | USESILDET | F | Enable speech/silence detector |
HPARM | MEASURESIL | T | Measure background noise level prior to sampling |
HPARM | OUTSILWARN | T | Print a warning message to stdout before measuring audio levels |
HPARM | SPEECHTHRESH | 9.0 | Threshold for speech above silence level (dB) |
HPARM | SILENERGY | 0.0 | Average background noise level (dB) |
HPARM | SPCSEQCOUNT | 10 | Window over which speech/silence decision reached |
HPARM | SPCGLCHCOUNT | 0 | Maximum number of frames marked as silence in window which is classified as speech whilst expecting start of speech |
HPARM | SILSEQCOUNT | 100 | Number of frames classified as silence needed to mark end of utterance |
HPARM | SILGLCHCOUNT | 2 | Maximum number of frames marked as silence in window which is classified as speech whilst expecting silence |
HPARM | SILMARGIN | 40 | Number of extra frames included before and after start and end of speech marks from the speech/silence detector |
HPARM | V1COMPAT | F | Set Version 1.5 compatibility mode |
TRACE | 0 | Trace setting |