Here are some of the research projects that I'm currently working on or have worked on in the past (listed in more or less reverse chronological order).
Recently, I've been working on a system for multi-voice polyphonic
transcription. The approach is based on an extension of non-negative
matrix factorization which has become a popular technique in the
source separation and music information retrieval communities. Most of
the NMF-based approaches to transcription only consider single
instruments and/or assume that the source models are known a priori. I
wanted to take a more general approach that would be able handle
multiple polyphonic sources simultaneously and would work with minimal
assumptions. The approach that I arrived at is somewhat reminiscent of
eigenvoice modeling where a set of source-dependent training models
are used to learn a model subspace. This subspace is then used to
constrain the search for models that fit the target mixture. While
eigenvoice modeling is typically based on HMMs for the source models
and PCA for the the model subspace, I use NMF for both. The result is
a new NMF variant that I call Subspace NMF.
The basics of the algorithm and its application to music transcription appeared at the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2009). You can read the paper or check out the poster on my publications page.
I ended up implementing a number of different NMF variants while working on this project which I've put together into a Matlab toolbox called NMFlib. Currently, it provides implementations of the original Lee and Seung algorithms (both the Euclidean and I-divergence updates), Local NMF, Sparse NMF, Convolutive NMF, Convex NMF, NMF using the Amari alpha divergence, NMF with beta divergences, as well as others. I'm making this code available in the hope that it will be useful to other researchers. Please let me know if you find any bugs/problems or have suggestions for how the library might be improved.
For my final project in Dan's DSP class (E4810), I built a very simple system for dereverberating audio signals in a fully blind setting (i.e. you have no prior knowledge about either the original clean signal or about the transfer function that has caused the reverberation). You can find sound examples, code, and other supporting material here. Currently I'm working to improve and extend the system to a filtering-based approach. Check back soon.
The Haptic Guidance System (HAGUS) is a device that I built as part of my master's thesis at the MIT Media Lab. While the motivation for the project grew out of the FielDrum, HAGUS was designed and built for a set of experiments looking at what effect physical guidance has on motor learning in a musical context. The basic idea of the device (and my thesis project) is to be able to "record" and playback" drumstick motions involved in percussion performance. Motions of expert (or at least competent) players are recorded and then used to train novice players by guiding them through correct drumstick motions using the device's playback feature. I conducted a study using 40 participants with no prior percussion experience to see if this type of training paradigm is effective for teaching percussion. Does it work? You can read my thesis, IEEE Haptics Symposium paper, or see the webpage for details, but the short answer is yes. The slightly longer answer is that, in terms of timing accuracy, there was a small but statistically significant (positive) difference between subjects who received guidanance during training versus those who did not. In terms of velocity (loudness) accuracy, there was a highly significant difference, with guided subjects incurring roughly 18% less error.
This project looked at how multilinear (tensor) algebra can be used to model the Head-Related Transfer Function (HRTF). HRTFs are filters that can be used to generate extremely convincing spatial audio. The problem is that they are person-specific (as they depend on anatomy) and are time consuming to collect. The basic idea of this project is to use the N-mode-SVD, a higher-order extension of the Singular Value Decomposition, to do targeted dimensionality reduction of HRTF data which is naturally multimodal (HRTF data is a function of sound direction, person, and ear). Once a low-dimensional model has been found, we can map simple anthropomorphic measurements (things like head width, pinna length, etc.) to person-specific model coefficients. This makes it easy to generate entire HRTF sets for people given only their anthropometric data and the model. For more detail, check out the 2007 ICASSP paper that I wrote with Alex Vasilescu.
This was the topic of my master's thesis at UCSC. I worked with my advisor David Helmbold to build a system for modeling the expressive performance of piano melodies. The model described in my thesis uses a hidden Markov model to learn the distributions of tempo and loudness changes that occur in different musical contexts. We also developed a second version of the model which uses hierarchical HMMs and incorporates phrase information. You can take a look at our Machine Learning paper for more details.