<<
Back to main page
E6820 Assignment 5
Reading
assignment
Paper: “Applying the
Harmonic plus Noise Model in concatenative speech synthesis,”
Yannis Stylianou, IEEE Transactions on Speech and Audio Processing,
9(1), 21-30, Jan 2001.
Summary:
This paper deals with concatenating lots of little speech samples into
full words and sentences, which is a rather interesting problem.
There are a few immediate problems with stringing together
samples. Most importantly, you have to make sure that the phase
and pitch match across boundaries. This paper discusses a pretty
weird model of speech to deal with these problems. Basically, the
authors make the assumption that voiced speech can be modeled as
fundamental + harmonics below the "maximum voiced frequency" and as
modulated noise above that frequency. I'm not sure what made them
think of this model, but it seems to work.
Thoughts:
I almost can't believe their model works, but their tests seem to show it does.
Back
to the top
Practical
assignment
(a) Why does this work?
freqs = angle(poles’)*sr/2/pi;
This line calculates the angles of the poles, which will be between 0
and 2*pi. The pre-sampled frequency range is [0, sr]. So,
sr/2/pi just maps the angle to a frequency.
mags = g(i) ./ (1 - abs(poles’));
What we are doing is an approximation. Basically, we're
saying that when you're near a pole on the unit circle, your magnitude
is going to be essentially explained by that close-by pole. The
other poles aren't going to make a big difference. If we had a
single-pole system, we'd have something like W(z) = g(i)/(z-pole(i)).
Evaluated on the unit circle, we have W(e^(jw)) = g(i)/[e^(jw) -
pole(i)], where w is the angle of the pole. So, then we'd have
mags = g(i) ./ abs( exp(j*w) - pole);
because g(i) is positive. We can take this one step further and say that abs( exp(j*w) - pole) = abs(exp(j*w)) - abs(pole) because the point on the unit circle lines up with the pole. Since abs(exp(j*w)) = 1, we then have the approximation above, mags = g(i) ./ (1 - abs(poles’)).
(b & c) You can see my code here. I have to say that the resynthesized voice was intelligible, but distinctly weird.
Back
to the top
Project
Work on the project can be found on my project page here.
Back
to the top
Christine Smit