Dan Ellis - Sensing and Imaging: the Sound of Science

Dave Meyers
March 15, 2015

Go shopping with a friend and you will be bombarded by a cacophony of sounds: your shared dialogue, the chatter of other shoppers, the rhythmic hum of in-store music, and even the buzzes and beeps of cell phone alerts. These represent just a handful of acoustic signals your brain automatically processes for you. Is it possible that machines could learn this intuitive process?

That is a question for Dan Ellis, professor of electrical engineering and founder of the Laboratory for Recognition and Organization of Speech and Audio (LabROSA) at Columbia Engineering. As the leader of the nation’s only lab to combine research in speech recognition, music processing, signal separation, and content-based retrieval for sound processing in machines, Ellis is making a lot of noise. His work in soundtrack classification pioneered the idea of using statistical classification of audio data for general classification of videos by their soundtracks. Now he is leading a group of researchers in investigating how to create an intelligent machine listener able to interpret live or recorded sound of any type in terms of the descriptions and abstractions that would make sense to a human listener.

“We’ve performed work in supporting speech recognition in noisy environments, which has obvious commercial applications in things like better voice-control systems or searching soundtrack databases for particular utterances,” Ellis says. “But we’re also very interested in other kinds of sounds, which, in general, have been neglected by research in favor of speech.”

Led by Ellis, recent research at LabROSA has included the classification of videos based on soundtracks. Instead of categorizing videos based on speech, this research allows machines to extract information based on sounds present, which is useful in the categorical organization of large collections of consumer-style videos.

“This means we’ll be able to search for unspoken audio similarly to how we search for spoken audio,” Ellis explains. “As society gathers more and more raw media recordings and demands easier, more effective retrieval, I see a lot of potential for commercial applications of such technology.”

With the development of very powerful machine learning techniques like deep neural networks—sets of algorithms in machine learning used to model complex abstractions in data—it has become necessary to access significant volumes of data. That is why Ellis and his team at LabROSA are currently developing new techniques to accelerate the process of classifying those data troves.

Motivated by an intrinsic interest in improving our understanding of audio—especially music—and how it varies, Ellis is largely focused on developing a series of new software libraries and annotated data that are expected to make a particularly significant impact on the music audio research community. “We’re excited by the prospect of helping people organize and manage their personal music collections, and helping people discover new music based on their listening preferences,” Ellis explains.

The origin of Ellis’s passion for music and electronics can be traced back to his childhood experiences. “While attending a music-centric school in England, I took lessons in piano, harp, bassoon, and percussion. I also took up electronics as a hobby, and I was particularly fascinated when a friend showed me his electronic synthesizer,” Ellis recalls. “I remember it very clearly, and for me, it presented the ideal intersection between the musical sounds that I loved and electronics technology.”

Although music and engineering may seem like an odd pairing, for Ellis, they share more in common than what meets the eye—or ear.

“I think the whole notion of technical research as being independent from the kind of creative exploration we expect from artists is a serious mistake,” he asserts. “The essence of research is identifying and exploring new ideas that have been overlooked, or coming up with novel and more powerful solutions.” To do that, Ellis applies an interdisciplinary focus.

“Before coming to Columbia, I hadn’t considered the value of departments beyond engineering,” he admits. “But I find myself regularly collaborating with colleagues from Columbia’s Sound Arts program and with researchers in the College of Physicians and Surgeons. Plus, being in New York has afforded me opportunities to interact with organizations like Google and Spotify. Together, we are contributing to the city’s positioning as a mecca for new sound-processing technology.”