E6820 Assignment 1

Reading assignment

Paper: J. L. Flanagan, R. M. Golden, “Phase Vocoder”, Bell System Technical Journal, November 1966, 1493-1509.

Summary:
This paper explains the mechanics of the phase vocoder for analysis and re-synthesis of vocal signals. The paper then goes on to explain a couple of uses for the phase vocoder. First, it can be used to reduce the bandwidth of a signal being send on an analog channel. Second, it can also be used to reduce and expand the time scale of a signal by an arbitrary factor. The paper also touches on possible improvements, namely that center frequencies for each band should probably not be on a strait linear scale.

Thoughts:
Before re-synthesis, the magnitude and phase derivative signals are run through low-pass filters with cutoffs at 25 Hz, which is 1/4 the cutoff of the low-pass filters during the analysis phase. The authors' justification for this filtering is the "experimental results of the present study." That's it. The authors don't elaborate. If, as they say, there is no easy analytical way to calculate the bandwidth of the magnitude and phase derivative signals, then the authors should explain more precisely how they came up with their rule of thumb.

The same holds true for their other findings. How did they decide that a signal was still intelligible? Did the authors simply listen to the results themselves? Did they ask others to listen? Did they recruit volunteers and run some kind of formal test? In the multiplexing for transmission section, the authors simply state that reductions by a factor of 2 or 3 will allow "transmission of acceptable quality." Who decided what "acceptable quality" was?

Practical assignment

Here are the spectrograms for mdwh0_sx305.wav. The top plot is a wide band spectrogram with a window size of 256. The bottom plot is a narrow band spectrogram with a window size of 1024.

Spectrograms of mdwh0_sx305.wav

To tease out the formants of the vowel in "chives," I zoomed in on the wide band spectrogram between about 1.7 and 2 seconds.

Zoom for the vowel in "chives"

It looks like the first formant is somewhere around 700 Hz, the second formant is at about 1600 Hz, and the third formant is at about 2700 Hz. The guy who spoke this sample had an interesting accent. Normally, I would have said that the main vowel section of "chives" was a diphthong, namely [aːɪ]. When you listen to the recording carefully and when you look at the spectrogram, however, it's obvious that he really only speaks a single vowel.

Christine Smit