Sinewave Speech Analysis/Synthesis in Matlab

Introduction

Sinewave Speech is a curious phenomenon where a small number of sinusoids added together take on some of the characteristics of speech - which in most respects they do not resemble at all. Using three sinusoids that track the frequency and amplitude of the first three speech formants, high intelligibility can be achieved. This phenomenon has been extensively investigated by Robert Remez, Philip Rubin and others. There is a much more detailed description at the web site of Haskin's Lab in New Haven CT, where much of the work was done.

The Haskins site includes several example analysis files that you can download. These files contain, in a compact form, all the data you need to resynthesize the sinewave speech. The Matlab routines below do this for you.

README - usage details
synthtrax.m - the main synthesis routine
slinterp.m - subsidiary linear interpolation routine
readswi.m - function to read the SWI-format data files into Matlab
s1pars.swi, s6pars.swi - example paramters files from the Haskins site.

Sinewave analysis

I was developing some examples of LPC analysis for my speech and audio class, and to my surprise, crude translation of LPC pole positions does a pretty good job of extracting sinewave speech parameters. Thus, I am pleased to offer the following routines:

Main routine: [F,M] = swsmodel(D,R,H) returns four sinusoids, with frequencies defined by rows of F and magnitudes defined by rows of M, tracking the formants in the speech sample D (of sampling rate R). Each column of F and M corresponds to H samples (so the analysis frame rate is R/H). Note: the sound is resampled to 8 kHz within the routine to focus the LPC on the main formant region, below 4 kHz.
Support routine:
[A,G,E] = lpcfit(D,P,H,W,O) fits P-th order LPC (all-pole, autoregressive) models to sound waveform D, using W-point windows advanced by H samples. Rows of A contain all-pole filter coefficiets [1 a1 a2 .. aP], with corresponding elements of G giving the frame gain (residual RMS). E is the actual excitation residual. Specifying OV as zero prevents overlap-add of the residual, for perfect reconstruction but a less useful E.
Support routine: [F,M] = lpca2frq(A) factorizes the LPC polynomial defined in each row of A (as from lpcfit.m) and returns the sorted positive frequencies (up to P/2 of them) in columns of F, each with a corresponding approximate magnitude in M.
Bonus routine: D = lpcsynth(A,G,E,H,OV) resynthesizes from LPC parameters returned by lpcfit, or using noise excitation if E is omitted.

An example use is shown below:

>> [d,r] = wavread('mpgr1_sx419.wav'); >> [F,M] = swsmodel(d,r); >> plot(F'); % show all the frequencies >> dr = synthtrax(F,M,r); >> % Listen to it >> sound(dr,r) >> % Compare to noise-excited reconstruction of LPC analysis >> [a,g] = lpcfit(d); >> dl = lpcsynth(a,g); >> sound(dl,r); >> % The LPC reconstruction is based on more or less the same information >> % as the sinewave replica, but it sounds less 'weird' >> % Compare the spectrograms >> subplot(311) >> specgram(d,256,r); >> title('Original'); >> subplot(312) >> specgram(dr,256,r); >> title('Sine wave replica'); >> subplot(313) >> specgram(dl,256,r); >> title('Noise-excited LPC reconstruction'); [image of spectrograms]

Referencing

If you wish to reference this code in your publications, you can use the following citation:

    D. P. W. Ellis (2004) 
    "Sinewave Speech Analysis/Synthesis in Matlab", 
    Web resource, available: http://www.ee.columbia.edu/ln/labrosa/matlab/sws/ .

Last updated: $Date: 2016/04/17 23:33:41 $

Dan Ellis <[email protected]>