<< Back to main page
E6820 Assignment 4
Reading
assignment
Paper: "YIN,
a fundamental
frequency estimator for speech and music," Alain de
Cheveigné
and Hideki Kawahara, Acoustical Society of America, 2002.
Summary:
This authors discussed a six-step process for find the
pitch in
a windowed sample. The first step is the standard
autocorrelation
technique. The other five steps attempt to fix problems
associated with the autocorrelation technique. Most of the
improvement seems to come from step 2, the "Difference function."
The authors then show how the YIN technique performs better
than
quite a few other techniques they found.
Thoughts:
I find it interesting that the authors allow a 20% margin
of
error. That's quite a substantial margin, especially when the
authors try to claim that YIN should work well with music.
Taking
a frequency up by 20% is approximately the same as going up by a minor
third (3 semi-tones). That is a huge transcription error.
Step 5 is rather interesting because it takes sub-sample wavelengths
into account. That is to say that this step allows YIN to
find
wavelengths that are, for example, 100.3 samples long.
Autocorrelation alone can't do that.
Back to the top
Practical
assignment
Matlab Auditory Demonstrations:
http://www.dcs.shef.ac.uk/~martin/MAD/docs/mad.htm
bm:
This was pretty cool. I noticed that even the
pure tones
generated a bit of energy in the higher frequencies, but I assume that
is because the pure tone is effectively "windowed" because it is in the
.wav.
detuning:
I found this demo annoying actually because the tones were
annoying. I'm afraid that I didn't match Moore's data at all!
As you can see,
all my points were near the origin. Outside of that area, I
really heard two tones for the "mistuned" sound.
ti:
There were a lot of different things to try for this demo.
I'm just going to comment on what I found particularly
interesting.
pure tone: I
really had to work to hear this as anything other than an interrupted
tone. Only when the tone was interrupted with random
energetic
noise bursts did I get the sense that the pure tone continued through
the noise.
siren: On
this one, I
definitely heard the siren as continuous more easily than I heard the
pure tone as continuous. With random bursts, I could lower
the
energy of the bursts and still get the same effect.
speech: I
heard continuous
speech almost no matter what I did with this example. I was
pretty eerie. The bandwidth, loudness, and duration of the
white
noise didn't seem to really matter. As far as intelligibility
is
concerned, shorter gaps made the sound more intelligible. I
didn't hear a big difference in intelligibility when I changed the
white noise bandwidth.
broad band noise:
For this one, it seemed as though noise at about 5 dB and for any
duration seemed to work.
narrow band noise: This
one seemed to need more powerful noise across the board.
music:
Oddly enough, I found that I need really energetic, long noise episodes
to get the feeling of continuity.
Back
to the top
Project
Work on the project can be found on my project page here.
Back
to the top
Christine Smit