next up previous contents index
Next: 5.4 Filterbank Analysis Up: 5 Speech Input/Output Previous: 5.2 Speech Signal Processing

5.3 Linear Prediction Analysis

 

In linear prediction (LP)   analysis, the vocal tract transfer function is modelled by an all-pole filter  with transfer functiongif

  equation4765

where p is the number of poles and tex2html_wrap_inline19864 . The filter coefficients tex2html_wrap_inline19866 are chosen to minimise the mean square filter prediction error summed over the analysis window. The HTK module HSIGP uses the autocorrelation method to perform this optimisation as follows.

Given a window of speech samples tex2html_wrap_inline19868 , the first p+1 terms of the autocorrelation sequence are calculated from

  equation4774

where i = 0,p. The filter coefficients are then computed recursively using a set of auxiliary coefficients tex2html_wrap_inline19874 which can be interpreted as the reflection coefficients of an equivalent acoustic tube and the prediction error E which is initially equal to tex2html_wrap_inline19878 . Let tex2html_wrap_inline19880 and tex2html_wrap_inline19882 be the reflection and filter coefficients for a filter of order i-1, then a filter of order i can be calculated in three steps. Firstly, a new set of reflection coefficients  are calculated.

  equation4780

for j = 1,i-1 and

  equation4785

Secondly, the prediction energy is updated.

  equation4794

Finally, new filter coefficients are computed

  equation4801

for j = 1,i-1 and

  equation4809

This process is repeated from i=1 through to the required filter order i=p.

To effect the above transformation, the target parameter kind must be set to either LPC  to obtain the LP filter parameters tex2html_wrap_inline19896 or LPREFC  to obtain the reflection coefficients tex2html_wrap_inline19898 . The required filter order must also be set using the configuration parameter LPCORDER . Thus, for example, the following configuration settings would produce a target parameterisation consisting of 12 reflection coefficients per vector.

    TARGETKIND = LPREFC
    LPCORDER = 12

An alternative LPC-based parameterisation is obtained by setting the target kind to LPCEPSTRA  to generate linear prediction cepstra. The cepstrum of a signal is computed by taking a Fourier (or similar) transform of the log spectrum. In the case of linear prediction cepstra , the required spectrum is the linear prediction spectrum which can be obtained from the Fourier transform of the filter coefficients. However, it can be shown that the required cepstra can be more efficiently computed using a simple recursion

  equation4814

The number of cepstra generated need not be the same as the number of filter coefficients, hence it is set by a separate configuration parameter called NUMCEPS .

The principal advantage of cepstral coefficients is that they are generally decorrelated and this allows diagonal covariances to be used in the HMMs. However, one minor problem with them is that the higher order cepstra are numerically quite small and this results in a very wide range of variances when going from the low to high cepstral coefficients . HTK does not have a problem with this but for pragmatic reasons such as displaying model parameters, flooring variances, etc., it is convenient to re-scale the cepstral coefficients to have similar magnitudes. This is done by setting the configuration parameter CEPLIFTER  to some value L to lifter the cepstra according to the following formula

  equation4823

As an example, the following configuration parameters would use a 14'th order linear prediction analysis to generate 12 liftered LP cepstra per target vector

    TARGETKIND = LPCEPSTRA
    LPCORDER = 14
    NUMCEPS = 12
    CEPLIFTER = 22
These are typical of the values needed to generate a good front-end parameterisation for a speech recogniser based on linear prediction.   

Finally, note that the conversions supported by HTK are not limited to the case where the source is a waveform. HTK can convert any LP-based parameter into any other LP-based parameter.


next up previous contents index
Next: 5.4 Filterbank Analysis Up: 5 Speech Input/Output Previous: 5.2 Speech Signal Processing

ECRL HTK_V2.1: email [email protected]