next up previous contents index
Next: 6 Transcriptions and Label Files Up: 5 Speech Input/Output Previous: 5.14 Version 1.5 Compatibility

5.15 Summary

 

  This section summarises the various file formats, parameter kinds, qualifiers and configuration parameters used by HTK. Table 5.1 lists the audio speech file formats which can be read by the HWAVE module. Table 5.2 lists the basic parameter kinds supported by the HPARM module and Fig. 5.8 shows the various automatic conversions that can be performed by appropriate choice of source and target parameter kinds. Table 5.3 lists the available qualifiers for parameter kinds. The first 6 of these are used to describe the target kind. The source kind may already have some of these, HPARM adds the rest as needed. Note that HPARM can also delete qualifiers when converting from source to target. The final two qualifiers in Table 5.3 are only used in external files to indicate compression and an attached checksum. HPARM adds these qualifiers to the target form during output and only in response to setting the configuration parameters SAVECOMPRESSED and SAVEWITHCRC. Adding the _C  or _K  qualifiers to the target kind simply causes an error. Finally, Tables 5.4 and 5.5 lists all of the configuration parameters along with their meaning and default values.

Name Description
HTK The standard HTK file format
TIMIT As used in the original prototype TIMIT CD-ROM
NIST The standard SPHERE format used by the US NIST
SCRIBE Subset of the European SAM standard used in the SCRIBE CD-ROM
SDES1 The Sound Designer 1 format defined by Digidesign Inc.
AIFF Audio interchange file format
SUNAU8 Subset of 8bit ".au" and ".snd" formats used by Sun and NeXT
OGI Format used by Oregan Graduate Institute similar to TIMIT
WAVE Microsoft WAVE files used on PCs
ESIG Entropic Esignal file format
AUDIO Pseudo format to indicate direct audio input
ALIEN Pseudo format to indicate unsupported file, the alien header size must be set via the environment variable HDSIZE
NOHEAD As for the ALIEN format but header size is zero

  Table. tex2html_wrap19991 . tex2html_wrap19986

Kind Meaning
WAVEFORM scalar samples (usually raw speech data)
LPC linear prediction coefficients
LPREFC linear prediction reflection coefficients
LPCEPSTRA LP derived cepstral coefficients
LPDELCEP LP cepstra + delta coef (obsolete)
IREFC LPREFC stored as 16bit (short) integers
MFCC mel-frequency cepstral coefficients
FBANK log filter-bank parameters
MELSPEC linear filter-bank parameters
USER user defined parameters
DISCRETE vector quantised codebook symbols
ANON matches actual parameter kind

  Table. tex2html_wrap19992 . tex2html_wrap19987

Qualifier Meaning
_A Acceleration coefficients appended
_C External form is compressed
_D Delta coefficients appended
_E Log energy appended
_K External form has checksum appended
_N Absolute log energy suppressed
_V VQ index appended
_Z Cepstral mean subtracted
_0 Cepstral C0 coefficient appended

  Table. tex2html_wrap19993 . tex2html_wrap19988

 

Module Name Default Description
HAUDIO LINEIN T Select line input for audio
HAUDIO MICIN F Select microphone input for audio
HAUDIO LINEOUT T Select line output for audio
HAUDIO SPEAKEROUT F Select speaker output for audio
HAUDIO PHONESOUT T Select headphones output for audio
SOURCEKIND ANON Parameter kind of source
SOURCEFORMAT HTK File format of source
SOURCERATE 0.0 Sample period of source in 100ns units
HWAVE NSAMPLES Num samples in alien file input via a pipe
HWAVE HEADERSIZE Size of header in an alien file
HWAVE STEREOMODE Select channel: RIGHT or LEFT
HWAVE BYTEORDER Define byte order VAX or other
NATURALREADORDER F Enable natural read order for HTK files
NATURALWRITEORDER F Enable natural write order for HTK files
TARGETKIND ANON Parameter kind of target
TARGETFORMAT HTK File format of target
TARGETRATE 0.0 Sample period of target in 100ns units
HPARM SAVECOMPRESSED F Save the output file in compressed form
HPARM SAVEWITHCRC T Attach a checksum to output parameter file
HPARM ADDDITHER 0.0 Level of noise added to input signal
HPARM ZMEANSOURCE F Zero mean source waveform before analysis
HPARM WINDOWSIZE 256000.0 Analysis window size in 100ns units
HPARM USEHAMMING T Use a Hamming window
HPARM PREEMCOEF 0.97 Set pre-emphasis coefficient
HPARM LPCORDER 12 Order of LPC analysis
HPARM NUMCHANS 20 Number of filterbank channels
HPARM LOFREQ -1.0 Low frequency cut-off in fbank analysis
HPARM HIFREQ -1.0 High frequency cut-off in fbank analysis
HPARM USEPOWER F Use power not magnitude in fbank analysis
HPARM NUMCEPS 12 Number of cepstral parameters
HPARM CEPLIFTER 22 Cepstral liftering coefficient
HPARM ENORMALISE T Normalise log energy
HPARM ESCALE 0.1 Scale log energy
HPARM SILFLOOR 50.0 Energy silence floor (dB)
HPARM DELTAWINDOW 2 Delta window size
HPARM ACCWINDOW 2 Acceleration window size
HPARM VQTABLE NULL Name of VQ table
HPARM SAVEASVQ F Save only the VQ indices
HPARM AUDIOSIG 0 Audio signal number for remote control

  Table. tex2html_wrap19994 . tex2html_wrap19989

Module Name Default Description
HPARM USESILDET F Enable speech/silence detector
HPARM MEASURESIL T Measure background noise level prior to sampling
HPARM OUTSILWARN T Print a warning message to stdout before measuring audio levels
HPARM SPEECHTHRESH 9.0 Threshold for speech above silence level (dB)
HPARM SILENERGY 0.0 Average background noise level (dB)
HPARM SPCSEQCOUNT 10 Window over which speech/silence decision reached
HPARM SPCGLCHCOUNT 0 Maximum number of frames marked as silence in window which is classified as speech whilst expecting start of speech
HPARM SILSEQCOUNT 100 Number of frames classified as silence needed to mark end of utterance
HPARM SILGLCHCOUNT 2 Maximum number of frames marked as silence in window which is classified as speech whilst expecting silence
HPARM SILMARGIN 40 Number of extra frames included before and after start and end of speech marks from the speech/silence detector
HPARM V1COMPAT F Set Version 1.5 compatibility mode
TRACE 0 Trace setting

  Table. tex2html_wrap19995 . tex2html_wrap19990


next up previous contents index
Next: 6 Transcriptions and Label Files Up: 5 Speech Input/Output Previous: 5.14 Version 1.5 Compatibility

ECRL HTK_V2.1: email [email protected]