E6820 Assignment 4

Reading assignment

Paper: "YIN, a fundamental frequency estimator for speech and music," Alain de Cheveigné and Hideki Kawahara, Acoustical Society of America, 2002.

Summary:
This authors discussed a six-step process for find the pitch in a windowed sample. The first step is the standard autocorrelation technique. The other five steps attempt to fix problems associated with the autocorrelation technique. Most of the improvement seems to come from step 2, the "Difference function." The authors then show how the YIN technique performs better than quite a few other techniques they found.

Thoughts:
I find it interesting that the authors allow a 20% margin of error. That's quite a substantial margin, especially when the authors try to claim that YIN should work well with music. Taking a frequency up by 20% is approximately the same as going up by a minor third (3 semi-tones). That is a huge transcription error.

Step 5 is rather interesting because it takes sub-sample wavelengths into account. That is to say that this step allows YIN to find wavelengths that are, for example, 100.3 samples long. Autocorrelation alone can't do that.

Back to the top

Practical assignment

Matlab Auditory Demonstrations: http://www.dcs.shef.ac.uk/~martin/MAD/docs/mad.htm

bm:
This was pretty cool. I noticed that even the pure tones generated a bit of energy in the higher frequencies, but I assume that is because the pure tone is effectively "windowed" because it is in the .wav.

detuning:
I found this demo annoying actually because the tones were annoying. I'm afraid that I didn't match Moore's data at all! As you can see, all my points were near the origin. Outside of that area, I really heard two tones for the "mistuned" sound.

ti:
There were a lot of different things to try for this demo. I'm just going to comment on what I found particularly interesting.

pure tone: I really had to work to hear this as anything other than an interrupted tone. Only when the tone was interrupted with random energetic noise bursts did I get the sense that the pure tone continued through the noise.

siren: On this one, I definitely heard the siren as continuous more easily than I heard the pure tone as continuous. With random bursts, I could lower the energy of the bursts and still get the same effect.

speech: I heard continuous speech almost no matter what I did with this example. I was pretty eerie. The bandwidth, loudness, and duration of the white noise didn't seem to really matter. As far as intelligibility is concerned, shorter gaps made the sound more intelligible. I didn't hear a big difference in intelligibility when I changed the white noise bandwidth.

broad band noise: For this one, it seemed as though noise at about 5 dB and for any duration seemed to work.

narrow band noise: This one seemed to need more powerful noise across the board.

music: Oddly enough, I found that I need really energetic, long noise episodes to get the feeling of continuity.

Back to the top

Project

Work on the project can be found on my project page here.

Back to the top

Christine Smit