A Model-Based Note Transcription System
Barry Rafkind - E6820 Speech & Audio Signal Processing - Final Project May 2005

Introduction

The ability to transcribe notes from music is a very rare and desirable talent. There exists a large demand for this function in the world of music and so an automatic note transcription system would be quite nice to have.

The system that I have developed does not try to solve the entire note transcription problem. Rather, it focuses on a specific transcription task where notes in the music are known a priori. This is accomplished by generating music audio from MIDI files which represent the ground truth note labels. Another constraint in my system is that the music must involve multiple instruments, each playing at most one note at a time. For simplicity's sake, I have kept the number of instruments in my experiments at two, although the system should be able to handle an arbitrary number. The system does not break down if instruments play chords, but as the chord will be transcribed to a single note, only one of the notes in the chord will be transcribed.

My procedure takes a model-based, additive approach. This means that a model is produced for each note separately. Note models distinguish themselves by their harmonic structure. Different notes will have different frequencies. Note models for the same note played by different instruments will have different spectral structures depending on the strength of harmonics produced by the instrument. Once the note models are produced, the music audio is approximated by the best linear combination of note models. Note models receiving the largest weights in this sum are the best candidates for being present in the music. Note onset times help to time-align the note approximations with the music.

I have evaluated my results on 10 Bach Inventions using the Dixon Success Formula which is equal to 100 x Number Correct / (Number Correct + False Positives + Missing Notes). In the following sections, I will explain the details my system.