Next: 5.1 General Mechanism Up: Part II: HTK in Depth Previous: 4.10 Summary

5 Speech Input/Output

Many tools need to input parameterised speech data and HTK provides a number of different methods for doing this:

input from a previously encoded speech parameter file
input from a waveform file which is encoded as part of the input processing
input from an audio device which is encoded as part of the input processing.

For input from a waveform file, a large number of different file formats are supported, including all of the commonly used CD-ROM formats. Input/output for parameter files is limited to the standard HTK file format and the new Entropic Esignal format.

tex2html_wrap19836

All HTK speech input is controlled by configuration parameters which give details of what processing operations to apply to each input speech file or audio source. This chapter describes speech input/output in HTK. The general mechanisms are explained and the various configuration parameters are defined. The facilities for signal pre-processing, linear prediction-based processing, Fourier-based processing and vector quantisation are presented and the supported file formats are given. Also described are the facilities for augmenting the basic speech parameters with energy measures, delta coefficients and acceleration (delta-delta) coefficients and for splitting each parameter vector into multiple data streams to form observations. The chapter concludes with a brief description of the tools HLIST and HCOPY which are provided for viewing, manipulating and encoding speech files.

Next: 5.1 General Mechanism Up: Part II: HTK in Depth Previous: 4.10 Summary

ECRL HTK_V2.1: email [email protected]