I defended my thesis, Characterization of the singing voice from polyphonic recordings, on January 10, 2011.
In order to study the singing voice, researchers have traditionally relied upon lab-based experiments and/or simplified models. Neither of these methods can reasonably be expected to always capture the essence of true performances in environmentally valid settings. Unfortunately, true performances are generally much more difficult to work with because they lack precisely the controls that lab setups and artificial models afford. In particular, true performances are generally polyphonic, making it much more difficult to analyze individual voices than if the voices can be studied in isolation.
This thesis approaches the problem of polyphony head on, using a time-aligned electronic score to guide estimation of the vocal line characteristics. First, the exact fundamental frequency track for the voice is estimated using the score notes as guides. Next, the harmonic strengths are estimated using the fundamental frequency information. Third, estimates in notes are automatically validated or discarded based on characteristics of the frequency tracks. Lastly, good harmonic estimates are smoothed across time in order to improve the harmonic strength estimates.
These final harmonic estimates, along with the fundamental frequency track estimates, parameterize the essential characteristics of what we hear in singers' voices. To explore the potential power of this parameterization, the algorithms are applied to a real data set consisting of five sopranos singing six arias. Vowel modification and evidence for the singer's formant are explored.
Audio demos from my work
As mentioned in the abstract, my thesis starts with a polyphonic recording, such as:
What makes this particularly cool is that all this information is coming from polyphonic recording, which hasn't really been done before.