Dan Ellis : Resources : Matlab : PLP, Rasta, MFCC :

Reproducing the feature outputs of common programs
using Matlab and melfcc.m


When I decided to implement my own version of warped-frequency cepstral features (such as MFCC) in Matlab, I wanted to be able to duplicate the output of the common programs used for these features, as well as to be able to invert the outputs of those programs. This page gives some examples of how cepstra can be calculated by three common programs (HTK's HCopy, feacalc from SPRACHcore, and mfcc.m from Malcolm Slaney's Auditory Toolbox for Matlab), and how to duplicate the results (or very nearly) using my melfcc.m routine. This also automatically shows you how to invert cepstra calculated by either path into spectrograms or waveforms using invmelfcc.m, since its arguments are the same.

HTK MFCC

2013-02-26: For an emulation of HTK's MFCC calculation accurate to the 3rd decimal place, see the modified rastamat code in calc_mfcc. The main differences were that HTK applies pre-emphasis independently on each window, and also removes the mean on each window.

Calculating features in HTK is done via HCopy, which can convert between a wide range of representations - including waveform to cepstra. HCopy takes its options from a config file. Thus, to convert 16 kHz sampled soundfiles to standard Mel-frequency cepstral coefficients (MFCCs), you would have a file config.mfcc containing:


SOURCEKIND = WAVEFORM
SOURCEFORMAT = WAVE
SOURCERATE = 625
TARGETKIND = MFCC_0
TARGETRATE = 100000.0
WINDOWSIZE = 250000.0
USEHAMMING = T
PREEMCOEF = 0.97
NUMCHANS = 20
CEPLIFTER = 22
NUMCEPS = 12

(The SOURCEFORMAT option specifies that the wavefiles are in MSWAVE format.) Then to calculate the features, you simply run HCopy from the Unix command line:


$ HCopy -C config.mfcc sa1.wav sa1-mfcc.htk

We can emulate this processing in Matlab, and compare the results, as below: (Note that the ">>" at the start of each line is an image, so you can cut and copy multiple lines of text directly into Matlab without having to worry about the prompts).

 % Load a speech waveform
 [d,sr] = wavread('sa1.wav');
 % Calculate HTK-style MFCCs
 mfc = melfcc(d, sr, 'lifterexp', -22, 'nbands', 20, ...
      'dcttype', 3, 'maxfreq',8000, 'fbtype', 'htkmel', 'sumpower', 0);
 % Load the features from HCopy and compare:
 htkmfc = readhtk('sa1-mfcc.htk');
 % Reorder and scale to be like mefcc output
 htkmfc = 2*htkmfc(:, [13 [1:12]])';
 % (melfcc.m is 2x HCopy because it deals in power, not magnitude, spectra)
 subplot(311)
 imagesc(htkmfc); axis xy; colorbar
 title('HTK MFCC');
 subplot(312)
 imagesc(mfc); axis xy; colorbar
 title('melfcc MFCC');
 subplot(313)
 imagesc(htkmfc - mfc); axis xy; colorbar
 title('difference HTK - melfcc');
 % Difference occasionally peaks at as much as a few percent (unexplained), 
 % but is basically negligable

 % Invert the HTK features back to waveform, auditory spectrogram, 
 % regular spectrogram (same args as melfcc())
 [dr,aspec,spec] = invmelfcc(htkmfc, sr, 'lifterexp', -22, 'nbands', 20, ...
      'dcttype', 3, 'maxfreq',8000, 'fbtype', 'htkmel', 'sumpower', 0);
 subplot(311)
 imagesc(10*log10(spec)); axis xy; colorbar
 title('Short-time power spectrum inverted from HTK MFCCs')
 subplot(312)
 specgram(dr,512,sr); colorbar
 title('Spectrogram of reconstructed (noise-excited) waveform');
 subplot(313)
 specgram(d,512,sr); colorbar
 title('Original signal spectrogram');
 % Spectrograms look pretty close, although noise-excitation
 % of reconstruction gives it a weird 'whispering crowd' sound

HTK PLP

HTK can also calculate PLP features. It turns out that these are somewhat different from the MFCC features because the cepstra are calculated by a different algorithm. However, we can still emulate and invert them with different parameters. To calculate PLP features with HCopy, we need a new config file, config.plp:

SOURCEKIND = WAVEFORM
SOURCEFORMAT = WAVE
SOURCERATE = 625
TARGETKIND = PLP_0
TARGETRATE = 100000.0
WINDOWSIZE = 250000.0
USEHAMMING = T
PREEMCOEF = 0.97
NUMCHANS = 20
CEPLIFTER = 22
NUMCEPS = 12
USEPOWER = T
LPCORDER = 12

(TARGETKIND is changed, and USEPOWER and LPCORDER are added). Then we calculate the features:


$ HCopy -C config.plp sa1.wav sa1-plp.htk

.. and compare to the Matlab version:

 [d,sr] = wavread('sa1.wav');
 % Calculate HTK-style PLPs
 plp = melfcc(d, sr, 'lifterexp', -22, 'nbands', 20, ...
      'dcttype', 1, 'maxfreq', 8000, 'fbtype', 'htkmel', ...
      'modelorder', 12, 'usecmp',1);
 % Load the HCopy features
 htkplp = readhtk('sa1-plp.htk');
 % Reorder (no scaling in this case)
 htkplp = htkplp(:, [13 [1:12]])';
 subplot(311)
 imagesc(htkplp); axis xy; colorbar
 title('HTK PLP');
 subplot(312)
 imagesc(plp); axis xy; colorbar
 title('melfcc PLP');
 subplot(313)
 imagesc(htkplp - plp); axis xy; colorbar
 title('difference HTK - melfcc');
 % Unexplained differences can be up to 20% for higher-order 
 % cepstra, but essentially the same

 % Invert the HTK features back again by mirroring args to melfcc
 [dr,aspec,spec] = invmelfcc(htkplp, sr, 'lifterexp', -22, 'nbands', 20, ...
      'dcttype', 1, 'maxfreq', 8000, 'fbtype', 'htkmel', ...
      'modelorder', 12, 'usecmp', 1);
 subplot(311)
 imagesc(10*log10(spec)); axis xy; colorbar
 title('Short-time power spectrum inverted from HTK PLPs')
 subplot(312)
 specgram(dr,512,sr); colorbar
 title('Spectrogram of reconstructed (noise-excited) waveform');
 subplot(313)
 specgram(d,512,sr); colorbar
 title('Original signal spectrogram');
 % Pretty close

feacalc MFCC

feacalc is the main feature calculation program from ICSI's SPRACHcore package. It's actually a wrapper around the older rasta. which was the original C-language implementation of RASTA and PLP feature calculation. feacalc has been expanded to be able to calculate (its own version of) MFCC features, so to parallel the HTK examples above, we'll start with feacalc's MFCC feature. They can be calculated with the following command line:


$ feacalc -sr 16000 -nyq 8000 -delta 0 -ras no -plp no \
    -dom cep -com no -frq mel -filt tri -cep 13 -opf htk \
    sa1.wav -o sa1-fcmfc.htk

and we duplicate this in Matlab as follows:

 [d,sr] = wavread('sa1.wav');
 % Calculate Feacalc-style MFCCs
 % (scale to match normalization of Mel filters)
 mfc2 = melfcc(d*5.5289, sr, 'lifterexp', 0.6, 'nbands', 19, ...
      'dcttype', 4, 'maxfreq', 8000, 'fbtype', 'fcmel', 'preemph', 0);
 % Load the HCopy features
 fcmfc = readhtk('sa1-fcmfc.htk');
 % No need to reorder or scale, just transpose
 fcmfc = fcmfc';
 subplot(311)
 imagesc(fcmfc(2:13,:)); axis xy; colorbar
 title('feacalc MFCC');
 subplot(312)
 imagesc(mfc2(2:13,:)); axis xy; colorbar
 title('melfcc MFCC (feacalc-style)');
 subplot(313)
 imagesc(fcmfc - mfc2); axis xy; colorbar
 title('difference feacalc - melfcc');
 % Small differences in high-order cepstra due to 
 % cumulative errors in Mel filter shapes


.. and inverting works just the same as above.

feacalc PLP

feacalc was originally designed to calculate PLP (and Rasta) features, so this is its more 'native' invocation:


$ feacalc -sr 16000 -nyq 8000 -delta 0 -ras no -dom cep -plp 12 \
    -opf htk sa1.wav -o sa1-fcplp.htk

..which we duplicate this in Matlab as follows:

 [d,sr] = wavread('sa1.wav');
 % Calculate Feacalc-style PLPs
 plp2 = melfcc(d, sr, 'lifterexp', 0.6, 'nbands', 21, ...
      'dcttype', 1, 'maxfreq', 8000, 'fbtype', 'bark', 'preemph', 0, ...
      'numcep', 13, 'modelorder', 12, 'usecmp', 1);
 % Load the HCopy features
 fcplp = readhtk('sa1-fcplp.htk');
 % just transpose
 fcplp = fcplp';
 subplot(311)
 imagesc(fcplp(2:13,:)); axis xy; colorbar
 title('feacalc PLP');
 subplot(312)
 imagesc(plp2(2:13,:)); axis xy; colorbar
 title('melfcc PLP (feacalc-style)');
 subplot(313)
 imagesc(fcplp - plp2); axis xy; colorbar
 title('difference feacalc - melfcc');
 % A few localized differences due windows etc.

.. and once again inverting works just the same as above.

Auditory Toolbox mfcc.m

The most popular tool for calculating MFCCs in Matlab is mfcc.m from Malcolm Slaney's Auditory Toolbox. This is what I used for a long time, until I needed something with more flexibility. That flexibility includes being able to duplicate mfcc.m. Here's how we can compare them in Matlab.

 [d,sr] = wavread('sa1.wav');
 % Calculate MFCCs using mfcc.m from the Auditory Toolbox
 % (gain should be 2^15 because melfcc scales by that amount, 
 % but in this case mfcc uses 2x FFT len)
 ce = mfcc(d*(2^14), sr);
 % Scale them to match (log_10 and power)
 ce = log(10)*2*ce;
 % Duplicate with melfcc.m
 mfc3 = melfcc(d, sr, 'lifterexp', 0, 'minfreq', 133.33, ...
      'maxfreq', 6855.6, 'wintime', 0.016, 'sumpower', 0);
 % .. and compare:
 subplot(311)
 imagesc(ce(2:13,:)); axis xy; colorbar
 title('Auditory Toolbox MFCC');
 subplot(312)
 imagesc(mfc3(2:13,:)); axis xy; colorbar
 title('melfcc MFCC (AudToolbox style)');
 subplot(313)
 imagesc(ce - mfc3); axis xy; colorbar
 title('difference AudTBox - melfcc');
 % Small differences mainly due to hanning vs. hamming 

Notes on the differences between different MFCCs


Last updated: $Date: 2013/02/26 17:00:16 $

Dan Ellis <[email protected]>