E6820 Assignment 5

Reading assignment

Paper: “Applying the Harmonic plus Noise Model in concatenative speech synthesis,” Yannis Stylianou, IEEE Transactions on Speech and Audio Processing, 9(1), 21-30, Jan 2001.

Summary:

This paper deals with concatenating lots of little speech samples into full words and sentences, which is a rather interesting problem. There are a few immediate problems with stringing together samples. Most importantly, you have to make sure that the phase and pitch match across boundaries. This paper discusses a pretty weird model of speech to deal with these problems. Basically, the authors make the assumption that voiced speech can be modeled as fundamental + harmonics below the "maximum voiced frequency" and as modulated noise above that frequency. I'm not sure what made them think of this model, but it seems to work.

Thoughts:

I almost can't believe their model works, but their tests seem to show it does.

Back to the top

Practical assignment

(a) Why does this work?

    freqs = angle(poles’)*sr/2/pi;

This line calculates the angles of the poles, which will be between 0 and 2*pi. The pre-sampled frequency range is [0, sr]. So, sr/2/pi just maps the angle to a frequency.

    mags = g(i) ./ (1 - abs(poles’));

What we are doing is an approximation. Basically, we're saying that when you're near a pole on the unit circle, your magnitude is going to be essentially explained by that close-by pole. The other poles aren't going to make a big difference. If we had a single-pole system, we'd have something like W(z) = g(i)/(z-pole(i)). Evaluated on the unit circle, we have W(e^(jw)) = g(i)/[e^(jw) - pole(i)], where w is the angle of the pole. So, then we'd have

    mags = g(i) ./ abs( exp(j*w) - pole);

because g(i) is positive. We can take this one step further and say that abs( exp(j*w) - pole) = abs(exp(j*w)) - abs(pole) because the point on the unit circle lines up with the pole. Since abs(exp(j*w)) = 1, we then have the approximation above, mags = g(i) ./ (1 - abs(poles’)).

(b & c) You can see my code here. I have to say that the resynthesized voice was intelligible, but distinctly weird.

Back to the top

Project

Work on the project can be found on my project page here.

Back to the top

Christine Smit