<< Back to main page
E6820 Assignment 1
Reading assignment
Paper: J. L.
Flanagan, R. M. Golden, “Phase Vocoder”, Bell
System Technical Journal, November 1966, 1493-1509.
Summary:
This paper explains the mechanics of the phase vocoder for
analysis and re-synthesis of vocal signals. The paper then goes on to explain
a couple of uses for the phase vocoder. First, it can be used
to reduce the bandwidth of a signal being send on an analog channel.
Second, it can also be used to reduce and expand the time
scale of a signal by an arbitrary factor. The paper also
touches on possible improvements, namely that center frequencies for
each band should probably not be on a strait linear scale.
Thoughts:
Before re-synthesis, the magnitude and phase derivative signals are run
through low-pass filters with cutoffs at 25 Hz, which is 1/4 the cutoff
of the low-pass filters during the analysis phase. The
authors' justification for this filtering is the "experimental results
of the present study." That's it. The authors don't
elaborate. If, as they say, there is no easy analytical way
to calculate the bandwidth of the magnitude and phase derivative
signals, then the authors should explain more precisely how they came
up with their rule of thumb.
The same holds true for their other findings. How did they
decide that a signal was still intelligible? Did the authors
simply listen to the results themselves? Did they ask others
to listen? Did they recruit volunteers and run some kind of
formal test? In the multiplexing for transmission section,
the authors simply state that reductions by a factor of 2 or 3 will
allow "transmission of acceptable quality." Who decided what
"acceptable quality" was?
Practical assignment
Here are the spectrograms for mdwh0_sx305.wav. The top plot
is a wide band spectrogram with a window size of 256. The
bottom plot is a narrow band spectrogram with a window size of 1024.

To tease out the formants of the vowel in "chives," I zoomed in on
the wide band spectrogram between about 1.7 and 2 seconds.

It looks like the first formant is somewhere around 700 Hz, the second
formant is at about 1600 Hz, and the third formant is at about 2700 Hz.
The guy who spoke this sample had an interesting accent.
Normally, I would have said that the main vowel section of
"chives" was a diphthong, namely [aːɪ]. When you listen to the
recording carefully and when you look at the spectrogram, however, it's
obvious that he really only speaks a single vowel.
Christine Smit