DAn Ellis: Talk Slides

Dan Ellis : Talk Slides

This page points to the slide packs for talks I've given since the start of 1997. It also acts as a kind of diary of all the talks I've given: even when several talks have almost identical slidepacks, I list them individually here.
The slides are PDF format, mostly created by Keynote (except for older talks). A few have embedded sound files (edited in by hand), as noted.

Robustness, Separation, and Pitch
(at the Morganfest, ICSI Berkeley, 2015-03-14): This was a symposium to celebrate Morgan's many contributions and influences. I spoke about how I'd come to ICSI in 1996 interested in pitch but not knowing much about statistical models, and learned to build a career based on speech recognition technologies. But I'm still thinking there's more use we could make of pitch.
Detecting proximity from personal audio recordings
(presented by Zhuo Chen at Interspeech'14, Singapore, 2014-09-14): We investigated the possibility of using personal audio lifelogs (i.e., continuous recordings from your cell phone) to tell when people are "nearby" in a crowded environment. We made some recordings at a noisy poster session at a local workshop, then showed that brute-force cross-correlation gave simmilar results to a much more scalable fingerpinting approach.
Enhancement of Very Noisy Speech Sounds (with embedded sound examples)
(at the IARPA Babel OP2 kick-off meeting, Linthicum, MD, 2014-07-09): In Team Swordfish's final appearance in the Babel program, I spoke about our efforts to handle the low-noise distant-mic conditions included in this spring's evaluation. Included is an introduction to the new "flat-pitch" processing I'm trying to get working.
Computational Auditory Scene Analysis
(invited talk at the BKN25 Symposium, McGill University, Montreal, Quebec, 2014-07-07): It was a huge honor and a huge pleasure to participate in this symposium to celebrate the 25th anniversary of the (coincidental) publication of seminal books by Al Bregman, Carol Krumhansl, and Eugene Narmour, in 1990. Al's book, Auditory Scene Analysis, was a huge influence on me and essentially set the agenda for my research career. In this talk, I tried to give an overview of the entire field of Computational Auditory Scene Analysis, which is computer scientists trying to implement the mechanismsof human perception illuminated and described in his book, as well as introducing some of the related techniques for handling sound mixtures that have emerged in the interim.
LabROSA Research Overview
(invited talk at MERL, Cambridge, MA, 2014-06-12): Jonthan Le Roux invited me to come visit him at MERL, to talk with him and John Hershey about a slightly wacky plan to use robots to record simulated cocktail parties with real, non-stationary room acoustics. And to give a talk highlighting recent work in my lab on music, environmental sound, and speech enhancement.
FPGAs
(at Douglas Repetto's Programming and Electronics for Art & Music class, 2014-04-17): I got curious about Field-Programmable Gate Arrays -- essentially little custom digital VLSI chips that you can configure on the fly -- when thinking about how to build a zero-latency digital filter, and after Brian Whitman explained to me what they were. Douglas wanted to know too, so he asked me to give a brief (and decidedly non-expert) introduction to his class of artist-engineers.
Audio Lifelogging
(at the IDSE "Speed Dating" evening, Columbia University, 2014-01-29): The Institute for Data Science and Engineering (IDSE) is hosting a series of so-called "speed dating" events, where a selection of faculty give 5 minute overviews of their research to promote possible cross-departmental collaborations. I chose to talk about audio lifelogging, the opportunities for extracting information from continuous recordings of a person's acoustic environment -- as well as the problems, such as privacy impact. This is something we first worked on in 2004, but I feel it's even more timely now.
The State of Music at LabROSA
(at NEMISIG-2014, Columbia University, 2014-01-25): NEMISIG is an informal one-day meeting we initiated in 2008 to bring together students and researchers in our area interested in music information. On its precise sixth anniversary, we brought it back to Columbia for a well-attended event. Several PIs gave brief "lab overview" talks; this is mine.
Acoustic Analysis of Babel Recording Conditions
(at Babel mid-phase PI meeting, Linthicum, Md, 2014-01-09): The Babel project involves creating speech recognizers for a wide range of world languages. The training data is being newly collected in many different countries, and also spans a range of recording conditions. I did some exploration of the acoustic properties of these recordings to see if there were systematic differences that could impact recognition performance.
Recognizing and Classifying Environmental Sounds
(invited keynote, CHiME workshop, Vancouver, 2013-06-01): I prepated a new version of my talk on environmental sound classification for this workshop associated with the CHiME "multisource environment" speech evaluation challenge. I tried to take a broader perspective, including some details of different kinds of evaluations, as well as a bit more analysis of some of the different techniques we have tried.
Subband Autocorrelation Features for Video Soundtrack Classification
(at ICASSP 2013, Vancouver, 2013-05-29): I gave the presentation for Courtenay's paper on our auditory model features for classifying environmental sound clips.
Data-Driven Music Understanding
(Invited talk at Brooklyn Tech High School, 2013-05-15): This was an "Engineering Master Class", part of our outreach efforts at local high schools. The talk outlines our work in trying to get insights from large musical databases.
On Communicating Computational Research
(Invited talk at Scholarly Communication Program Seminar, Columbia University, 2013-04-04): This was a very interesting panel discussing the big questions posed by the increasing importance of software in academic research. I tried to give my perspective as a participant in this kind of work.
Augmenting and Exploiting Auditory Perception for Complex Scene Analysis
(Invited talk at the Defense Science Research Council workshop on Electronic Enhancement of Sensory Dead Space, Arlington VA, 2013-03-28): I was invited to participate in this workshop that advises DARPA on scientific frontiers; the topic was ways to use technology to enhance individual senses. My talk covers some basics of auditory/acoustic scene analysis in humans and machines, and speculates about future "super sense" augmentations.
The State of Music at LabROSA
(at NEMISIG 2013, held at the offices of The Echo Nest in Somerville, MA, 2013-01-26): This is the small, informal, annual regional meeting of people in the north-east who are doing music informatics. I gave a quick overview of music-related projects in the lab, and tried to discuss what I see as the future promising areas for academic music information research.
Music Information Retrieval for Jazz
(Invited talk at Columbia's Center for Jazz Studies, 2012-11-15): Despite its promising title, this talk was a first introduction aimed at jazz musicologists explaining what MIR techniques exist and how they might be useful for jazz, as a starting point for a recently-started project where we try applying them to jazz -- i.e., we don't know yet. I also have the slides with embedded audio, but I can't figure out how to make audio transport controls appear in the PDF, so it may not be all that useful.
Recognizing and Classifying Environmental Sounds
(Invited talk at MERL's Speech and Audio in the North East (SANE) workshop, Cambridge, MA, 2012-10-24): A summary of our work on recognizing environmental sounds, particularly for video classification by soundtrack.
Handling Speech in the Wild
(Invited talk at the Hearing Research Seminar, Boston University, 2012-10-05): This is a broad overview of recent work related to processing speech embedded in noisy environments, delivered to Steve Colburn's group at Boston University.
Mining Audio
(Invited talk at the Data to Solutions seminar, Columbia, 2012-09-14): We have a new program in "Data to Solutions", and this was my introduction to the way that "big data" problems appear in audio - looking first at managing large music collections, then discussing the issues of video classification and retrieval by soundtrack features.
Pitch Tracking by Subband Autocorrelation Classification
(at Interspeech 2012, Portland, 2012-09-11): This is the poster I presented for our Interspeech paper on trained subband autocorrelation pitch tracking.
Inharmonic Speech: A Tool for the Study of Speech Perception and Separation
(at SAPA-SCALE 2012, Portland, 2012-09-08): This was the talk I gave for the paper I did with Josh McDermott and Hideki Kawahara on using the STRAIGHT analysis-synthesis framework to create "realistic" speech tokens where the voiced speech was composed of inharmonically-arranged components.
Mining Large-Scale Music Data Sets
(invited talk at ITA-2012, San Diego, 2012-02-09): ITA is a very nice invitation-based workshop organized by UCSD each year. Gert Lanckriet put together a session on music information retrieval; my (short) talk was about beat chroma and cover song search in the million song dataset.
Engineering & the World
(4th grade science class, The School at Columbia University, 2012-01-19): I had the chance to address my son's 4th grade class. I wanted to just present engineering as an activity, and explain how it differs from science. I had a few secondary goals, including tieing in to their robotics curriculum. I had fun making the presentation, although it didn't go much like I expected.
Perceptually-Inspired Music Audio Analysis
(Invited talk, WISSAP, Indian Institute of Science, Bangalore, 2012-01-07): Although practicalities prevented me actually going to Bangalore for the 2012 WInter School on Speech and Audio Processing, I was able to deliver this lecture via videoconference. It covers a few aspects of music audio processing, attempting to link them to what we know of human auditory scene analysis. (18MB, includes linked sound examples.)
Speech Separation for Recognition and Enhancement
(at the DARPA Language Technologies Day, Tampa FL, 2011-10-27): This was intended as a pitch for the significance of complex acoustic scenes ("Speech in the Wild"), and the importance of thinking about ways for separating and organizing them. Includes very brief reviews of separation by spatial cues, pitch, and source models.
Research at LabROSA
(at EE Research Overview Day, Columbia University, 2011-09-09): My brief talk introducing projects at the lab as part of our department's annual research overview day.
Joint Audio-Visual Signatures for Web Video Analysis
(at National Geospatial Intelligence Agency Annual Research Symposium, 2011-08-31): Presentation of our joint project on video classification including the TRECVID MED2010 system, at the annual NGA sponsored projects review.
Environmental Sound Recognition and Classification
(Keynote talk, Hands Free Speech Communication and Microphone Arrays 2011, Edinburgh, 2011-06-01): A longer talk describing getting information out of soundtracks and environmental recordings, at the workshop that combines speech recognition with multimicrophone techniques.
Using the Soundtrack to Classify Videos
(Invited talk, Visual Analytics Consortium 2011 meeting, University of Maryland, 2011-05-04): A shortened version of my talk on classifying environmental sounds, for this panel session discussing the development of "multimedia analytics" - the science of how people can effectively and efficiently extract information from multimedia content.
Extracting Information from Sound
(Invited talk, Digital Media Analysis, Search and Management DMASM 2011 International Workshop, CalIT2 center, UC San Diego, 2011-03-01): An overview of our work in classifying environmental sound, at a workshop put together by NTT Japan, on multimedia content analysis (I participated via videoconference).
Music Audio Research at LabROSA
(Invited talk, NEMISIG-2011, Drexel University, Philadelphia, 2011-01-28): A brief overview talk on the current music projects in my group, given at the third North East Music Information Special Interest Group meeting.
Joint Audio-Visual Signatures for Web Video Analysis
(at National Geospatial Intelligence Agency Annual Research Symposium, 2010-09-14): Presentation of our joint project on video classification, at the annual NGA sponsored projects review.
Research at LabROSA
(at EE Research Overview Day, Columbia University, 2010-09-10): A 10 minute introduction to the work in my lab for our department's annual research overview day.
A History and Overview of Machine Listening
(Invited talk, Computational Audition workshop, UCL Gatsby unit, 2010-05-12): I was asked to give a review talk a this private workshop organized by Josh McDermott and Maneesh Sahani. I found it pretty difficult to try to bring together all the different threads and technologies that impinge on machine listening, but I had a go.
Using Speech Models for Separation
(Invited talk, Acoustical Society Meeting, Baltimore, 2010-04-20): Another talk based on the work of Ron Weiss and Mike Mandel, at a special session on understanding speech in interference organized by Carol Espy-Wilson at the spring Acoustical Society meeting.
Music Audio Research at LabROSA
(Invited talk, NEMISIG-2010, NYU, 2010-01-29): A brief overview talk on the current music projects in my group, given at the second North East Music Information Special Interest Group meeting.
Some projects in real-world sound analysis
(Invited talk, NYU, New York, 2009-12-10): Condensed version of the summary of Ron and Mike's talk, plus quick descriptions of current work in soundtrack and music analysis.
Using Speech Models for Separation
(Invited talk, Johns Hopkins University, Baltimore, 2009-10-13): A mashup of my previous Models talk, and slides pulled from the defenses of Ron Weiss and Mike Mandel.
Learning, Using, and Adapting Models in Scene Analysis
(Invited talk, Scene Analysis Workshop, Berlin Institute of Advanced Studies, 2009-04-23): This was my research talk at an interesting workshop brining together both auditory and vision researchers to try to thrash out the question of scene analysis. The talk tries to make the case that models of source behavior are the way to conquer uncertainty in mixtures.
Sequential Organization from an Ecological Perspective
(Invited talk, Scene Analysis Workshop, Berlin Institute of Advanced Studies, 2009-04-24): This was my short talk at the workshop, arguing that auditory streaming is a natural and useful response to the problem of acoustic scene analysis.
Mining for the Meaning of Music
(Invited talk, Music Technology Seminar, NYU, 2008-10-17): Latest version of my talk on mining for interesting structure in beat-chroma representations gives first results of using NMF on beat-chroma matrices. (Includes embedded sound examples for Acrobat 6+.)
Data-driven Music Understanding
(at Columbia University Family Weekend, New York, 2008-10-03): An overview of our music analysis work aimed at a lay audience of visiting families.
LabROSA Chord Recognition System
(at ISMIR 2008 MIREX poster session, Drexel Univ., Philadelphia, 2008-09-17): A small poster describing my fairly predictable trained chord recognition system that came 2nd in the MIREX chord evaluation.
Research in Sound Analysis
(at EE Research Overview Day, Columbia University, 2008-09-05): A 10 minute introduction to the work in my lab (via 3 examples) for our annual overview day.
Cross-Correlation of Beat-Synchronous Representations for Music Similarity
(at ICASSP-2008, Las Vegas, 2008-04-03): Describes the experiments with using cover-song detection as a basis for finding similar songs (not intended as covers) in large music databases.
Mining for the Meaning of Music
(Invited talk, Distinguished Lecturer Series, Centre for Interdisciplinary Research on Music, Mind, and Technology, McGill University/University de Montréal, Montreal, 2008-03-27): Expanded and extended talk on the.project to cluster beat-chroma extracts from a large number of pop-music tracks.
Music Research at LabROSA
(at the EE department Open House, 2008-03-14): A slightly modified version of my ten-slide overview of music research in my lab. This PDF has the sound examples embedded (Acrobat 6+ compatible).
Current Music Research at LabROSA
(at the North East Music Information Special Interest Group (NEMISIG) meeting at Columbia University, 2008-01-25): A ten minute summary of the music-related projects currently active in my lab, as part of the mutual presentations that started this informal meeting we organized.
Searching for Similar Phrases in Music Audio
(at the DMRN+2 one day workshop, Centre for Digital Music, Queen Mary, University of London, 2007-12-18): First talk to actually present some results from the idea of chopping beat-chroma representations into little pieces, clustering them (in this case into LSH bins), and seeing which clusters are used most often.
Using Source Models in Speech Separation
(at the Next-Generation Statistical Models and Inference for Speech and Audio Signal Processing, organized by Patrick Wolfe, Radcliffe Institute for Advanced Study, Cambridge MA, 2007-11-09): My talk at a very interesting small workshop that brought together speech scientists, engineers, and statisticians, to look for common ideas and problems. My talk covers work on using ASR models to recognize mixtures (Ron Weiss's work), and recovering spatial information in reverb (Mike Mandel's work).
Extracting and Using Music Audio Information
(Invited ECE Seminar as guest of Gert Lanckriet, UC San Diego, 2007-11-02): This talk surveys our work in the past five years on extracting information from music audio, and the further goals of estimating music similarity and discovering underlying structure. When I was putting it together, I was surprised by how much there was to say!
The 2007 LabROSA Cover Song Detection System
(Poster presented at the MIREX session at ISMIR-07, Vienna): Describes the latest incarnation of the cover song detection system, which improves detection by almost 50% over the 2006 system by trying different tempos and a few other tweaks. (See the corresponding paper).
Classifying music audio with timbral and chroma features
(Poster presented at ISMIR-07, Vienna): Poster presenting our work on using models of the distribution of chroma features (i.e. modeling chords as points in 12-dimensional space) as a way to distinguish artists. Chroma features are much less accurate than MFCC (timbral) features, but there is a benefit in combining them, suggesting that chroma features capture a different kind of information. (See the corresponding paper).
Analysis of Everyday Sounds
(Invited seminar, Kodak Research Labs, Rochester NY, 2007-07-24): Description of our work on analyzing environmental sounds from personal audio recorders, and more recently from the soundtracks of short consumer-shot videos, which we've fused with video analysis by our colleagues to get remarkably usable automatic tags.
Using Sound Source Models to Organize Mixtures
(Invited talk, ASIP-NET.DK meeting, Denmark Technical University, 2007-05-24): Revised version of my talk arguing for models as the basis for source separation, as part of a one-day meeting on Computational Auditory Scene Analysis and other advanced perceptual models for sound organization and separation.
Beat-Synchronous Chroma Representations for Music Analysis
(Invited talk, Intelligent Sound workshop, Karlslunde, Denmark, 2007-05-23): Expanded version of ICASSP cover song talk with some more discussion of other applications of the beat-chroma representation.
Identifying Cover Songs with Beat-Synchronous Chroma Features
(ICASSP-07, Hawai'i, 2007-04-20): All-new material describing the problem of cover songs, how to calculate chroma features and track beats with dynamic programming, and how to match beat-chroma matrices. Also available as smaller version without embedded audio examples.
Fingerprinting to Identify Repeated Sound Events in Long-Duration Personal Audio Recordings
(ICASSP-07, Hawai'i, 2007-04-19): This poster describes our work on using the Shazam noise-robust fingerprinting scheme to blindly and efficiently identify recurring sound events such as telephone rings and radio jingles from long-duration "personal audio" archives as might be captured by a body-worn continuous recorder. (Warning: PDF is 72"x42")
Sound Organization by Source Models in Humans and Machines
(Invited talk, NIPS Workshop on Advances in Models of Acoustic Processing, Whistler BC, 2006-12-09): An outline of the arguments for and issues arising from organizing sound mixtures based on source models i.e. some representation of the limited acoustic variation expected from real sounds.
Cover Song ID with Beat-Synchronous Chroma Features
(ISMIR MIREX session, Victoria BC, 2006-10-11): This poster describes my submissions to the 2006 Music Information Retrieval Evaluation tasks on tempo estimation, beat tracking, and cover song identification. In the event, I was unable to attend the session, but the poster was intended as a substitute so that people could find out a little about what we did.
Minimal-Impact Personal Audio Archives
(Microsoft Research "Memex day", Redmond WA, 2006-07-19): Review of the personal audio work, as part of a one-day meeting for people involved in Microsoft's Digital Memories (Memex) program.
Extracting Information from Music Audio
(Invited talk at DTU/AAU Intelligent Sound workshop, Saeby, Denmark, 2006-05-22): Overview of the work in getting information out of music audio conducted at LabROSA, as part of this workshop run by a very interesting Danish project.
Auditory Scene Analysis in Humans and Machines
(Tutorial at the AES Convention, Paris, 2006-05-20): Tutorial on auditory scene analysis and source separation in humans and machines.
VQ Source Models: Perceptual & Phase Issues
(Invited talk at special session on source separation organized by Shoji Makino, ICASSP-06, Toulouse, 2006-05-16): Some highlights from our work trying to use VQ codebooks to separate and enhance speech.
Using Learned Source Models to Organize Sound Mixtures
(Invited talk at "New Ideas In Hearing" workshop, Ecole Normale Superiere, Paris, 2006-05-12): The basic argument that scene analysis must rely on prior knowledge, which we can consider as being encapsulated in some kind of internal models of the real acoustic world.
Model-Based Separation in Humans and Machines
(Invited talk at the special session on approaches to audio separation organized by Emmanuel Vincent, ICA-2006, Charleston SC, 2006-03-08): Comparing human performance on source separation with different automatic approaches, and arguing for (a) using models, and (b) concentrating on the content, not the signal per se.
Music Information Extraction
(Invited talk as guest of Ozgur Izmirli at Connecticut College, 2006-02-13): Overview of various threads of our music-related research. This PDF includes soundfiles (for Adobe Reader 7+).
Speech Separation in Humans and Machines
(Opening keynote, ASRU-05, San Juan Puerto Rico, 2005-11-28): An overview of the problem of separating speech in acoustic mixtures, including some perceptual results, brief introductions to ICA and CASA, and a pitch for model-based analysis (all drawing heavily on other sources). Includes soundfiles that can be played with Adobe Acrobat 6 and later (with the free Multimedia Package option, needs Adobe Reader 7.0.5 on Mac OS X; I don't know about other platforms).
Enhancing the Intelligibility of Speech in Speech Noise
(JHU CLSP Summer Workshops '06 planning meeting, 2005-11-11): A brief introduction to a project we are proposing on integrating different source separation techniques to improve intelligibility resulting from speech separation from interfering speech.
Extracting Information from Music Audio
(CAIP seminar, Rutgers University, 2005-10-26): A slightly expanded version of this talk on various projects to get information out of audio, given during a visit to Larry Rabiner and colleagues at CAIP.
Extracting Information from Music Audio
(Columbia Applied Math seminar, 2005-09-20): Overview of our goals in analyzing music signals, and highlighting the classsifier melody extraction and SVM-song-based artist ID from the recent MIREX evaluation.
MIREX 2005: What did we learn?
(At the MIREX panel, ISMIR-05, 2005-09-14): Some text-only slides summarizing my comments on the Music Information Retreival Evaluation (MIREX-05) for this year.
Computational Auditory Scene Analysis
Model-Based Scene Analysis
(Univ. Oldenburg, Germany, 2005-06-30): Two talks on signal separation at a forum of hearing aid researchers and developers. I came at the invitation of Prof. Birger Kollmeier, who organized the meeting.
Searching and Describing Audio Databases
(Google New York, 2005-05-23): A summary of topics potentially interesting to the researchers at this lab, including music information extraction and similarity matching, meeting recordings, and personal audio "lifelog" archives.
Sound Analysis Research at LabROSA
(Columbia EE New Grad Student Open House, 2005-04-01): Stitching together highlights from a few recent talks to give an overview of the research in my lab for an open house.
Transforming Spontaneous to Read Speech
(EARS STT Meeting, Philadelphia, 2005-03-24): Two-slide description of a project still in its very early stages attempting to improve speech recognition by 'normalizing' variability in speech due to different speaking styles. This is based on the idea that informal speech shows less deep formant modulations than read speech, so if we modify the speech in something like the formant-frequency domain, perhaps we can make it easier for conventional recognizers to handle. In the event, I never actually showed these slides!
Clap detection and discrimination for rhythm therapy
(IEEE ICASSP-05, Philadelphia, 2005-03-22): Describes a project to distinguish someone clapping close to the microphone from others clapping in the room; in preparing the talk, I made it work much better than we reported in the paper!
What can we Learn from Large Music Databases?
(Music Information Processing Systems, a workshop at NIPS-2004, Whistler, BC, 2004-12-18): My first attempt to spell out some kind of vision of what we are trying to do with several of the music-related projects from the past couple of years, and why working with hundreds of hours of music is exciting.
Learning for Scene Analysis
Integrating CASA with Other Systems
Evaluating Speech Separation Systems
(At the Second Montreal Workshop on Speech Separation in Complex Acoustic Environments, Montreal, PQ, 2004-11-05): A set of talks from this meeting I helped organize. The first, from the introductory session, argues for the importance of machine-learned knowledge in signal separation and scene analysis. The second discusses how Computational Auditory Scene Analysis may be integrated with other separation mechanisms, and the last one summarizes a session trying to launch a debate on unified evaluation standards.
Minimal-Impact Audio-Based Personal Archives
(At the First Workshop on Continuous Archiving and Recording of Personal Experiences, Columbia University, NY, 2004-10-15): A slightly more detailed talk on our work on accessing recordings made by body-worn audio recorders, including examples of speech scrambling, and a screenshot of the improved visualization/user interface.
Eigenrhythms: Drum Track Bases
(At the Internation Conference on Music Information Retrieval, Barcelona, Spain, 2004-10-14): Describes the project on extracting the basic drum patterns from a large set of MIDI pop music renditions, and trying to describe them with a reduced-dimensional set of basis patterns. The talk includes some examples of basis functions derived from ICA, LDA, and NMF which go beyond what was described in the paper. Although I made the slide pack, due to a change in the schedule I wasn't able to give the talk; my co-author John Arroyo stepped in.
Segmenting and Classifying Long-Duration Recordings of "Personal Audio"
(At the Workshop on Statistical and Preceptual Audio processing SAPA-2004, Jeju, Korea, 2004-10-03): First full talk on the project for segmenting, classifying, and accessing near-continuous recordings collected by a body-worn audio recordings.
Audio & Music Research at LabROSA
(Visiting Nebojsa Jojic, Microsoft Research, Redmond WA, 2004-08-24): Survey of current projects including speech features work, music analysis, and 'personal audio', given during a visit to our collaborator at Microsoft Research.
Recent/future EARS research at Columbia
(EARS NA retreat, Bodega Bay Lodge, 2004-08-05): Brief overview of some ideas for the "novel approaches" to speech recognition being developed as part of this consortium.
Audio & Music Research at LabROSA
(Queen Mary, University of London, 2004-07-29): Overview of some current non-speech-recognition projects in the group, given to colleagues in the Centre for Digital Music at QMUL.
Multimedia Applications of Audio Recognition
(Multimedia Workshop at Columbia, New York, 2004-06-18): Quick overview of three projects in LabROSA: Speaker turn segmentation, segmenting long-duration `personal audio' recordings, and modeling the space of drum patterns in pop songs.
Speaker Turns from Between-Channel Differences
(NIST Meeting Recognition Workshop, at ICASSP, Montreal, 2004-05-17): Outlines our work on using timing differences between arbitrarily-placed tabletop mics to recover patterns of speaker turns in meetings. Companion to the paper Speaker Turn Segmentation based on Between-Channel Differences.
Audio signal recognition for speech, music, and environmental sounds
(Invited talk, Special session on classification, 146th meeting of the Acoustical Society of America, Austin, 2003-11-13): This is basically a tutorial on statistical pattern recognition and how it can be used for sound recognition, including a very fast overview of speech recognition. The special session was mainly aimed at non-audio people, including underwater acoustics people.
Sound, mixtures, learning: A perspective on CASA
(At NSF Speech Separation Workshop, Montreal, 2003-11-02): This was a very interesting workshop I co-organized bringing together engineers, psychologists, neurophysiologists, and machine learning types, to talk about the problem of separating mixed audio signals. My talk was towards the end, and got successively pared down as others made the poitns I had in mind. In the end it's just a vision of the sound organization problem, a description of the multisource decoder idea, and a few slides on evaluation and tasks.
Chord Transcription with EM-Trained HMMs
(At Int. Conf. on Music IR ISMIR-03, Baltimore, 2003-10-29): LabROSA had three talks at this year's ISMIR, including one on this work by Alex Sheh that I put together and presented since Alex couldn't make it to the conference.
Semantic Audio Analysis
(Panelist, Audio Engineering Society Special Session, New York, 2003-10-13): Mark Sandler has created a new technical committee within the AES to address Semantic Audio Analysis, and I was one of three panelists at a session to discuss what this could actually mean.
Ideas for Next-Generation ASR
(NSF Workshop on Next-Generation ASR, Georgia Tech, 2003-10-07): Although I was a participant in Chin-Hui Lee's NSF workshop to map the course of speech recognition, I never actually presented these slides. I had prepared them "in case" and as a way to organize my thoughts. All my points were raised in one way or another at the workshop.
Sound, Mixtures, Learning: LabROSA Overview
(GRASP Lab, U. Pennsylvania, Philadelphia PA, 2003-09-26): I went to visit Lawrence Saul who does machine learning for sound analysis at UPenn. For this talk, I added a couple of new slides on Manuel's latest work on graphical models for speech separation.
Sound, Mixtures, Learning: LabROSA Overview
(CUNY Graduate Center, Speech & Hearing Program, New York NY, 2003-09-24): At the invitation of Glenis Long, I finally got the chance to meet with some of the other hearing-oriented researchers in Manhattan. For this talk, I added a couple of new slides on speech-recognition work (FDLP, MI feature selection).
Sound, Mixtures, Learning: LabROSA Overview
(Microsoft Research, Redmond WA, 2003-07-23): While I was on the west coast, I took the opportunity to zip up to Seattle to visit Manuel who was interning there and Nebojsa Jojic who he is working with. This overview talk is very similar to the one I gave at Google, except I filled in the novel visualization of long-time-scale "LifeLog"-style audio (slide 29).
Machine Recognition of Sounds in Mixtures
(Stanford Hearing Seminar, CCRMA, Palo Alto, 2003-07-22): Since I was in the neighborhood, I also stopped by Malcolm Slaney's Hearing Seminar and gave an extended version of the talk I presented at the ASA Nashville meeting on interpreting speech recognition as a useful kind of CASA problem, and using missing-data recognition as a way to incorporate high-level knowledge into sound scene organization.
Sound, Mixtures, Learning: LabROSA Overview
(Google Inc., Mountain View, 2003-07-21): During a trip to California, I stopped by to see Adam who was interning with Google, and gave an overview talk introducing the lab, and slanted towards the kind of large-database issues that are Google's forte.
Pattern Recognition Applied to Music Signals
(Johns Hopkins CLSP Summer School, Baltimore, 2003-07-01): This was a short lecture introducing the basics of feature calculation and statistical pattern classification for audio tasks. Although the techniques are drawn from speech recognition, the particular example domain is detecting the singing within pop music, to tie in with the practical that the students did in the afternoon.
EARS Novel Approaches: FDLP feature results
(DARPA EARS project PI meeting, Boston, 2003-05-21): My slides contributed to the ICSI Novel Approaches team presentation, detailing our first speech recognition results using the new FDLP features, as well as some results in using mutual information for feature selection.
Machine Recognition of Sounds in Mixtures
(Acoustical Society of America meeting, Nashville, 2003-04-29): Presenting the work on which I collaborated with Jon Barker and Martin Cooke on recognizing speech in mixtures using missing data techniques and searching across possible segmentations. I'm trying to spin this as a different paradigm of CASA, since CASA-as-signal-separation seems problematic.
Scene Analysis for Speech and Audio Recognition
(MIT Speech & Hearing Program, 2003-04-16): This talk focuses on several different approaches to handling sound mixtures: computational auditory scene analysis, multicondition training, and parallel-model-based techniques such as HMM decomposition and multisource decoding.
Sound, mixtures. and learning: LabROSA overview
(Universitat Pompeu Fabra, Barcelona, 2003-03-20): Background on Auditory Scene Analysis and overview of current projects and direction of LabROSA, given during my visit to Xavier Serra's Music Technology Group. Includes new slides on some music projects, their particular interest
Modeling meeting turns
(M4 project meeting, Sheffield, 2003-01-30): Describes some work I did on segmenting meeting transcriptions according to the patterns of the speaker turns, and modeling the overall amount said by participants in each meeting with a measure of their innate 'talkativity'
EARS Novel Approaches: New features, new units
(DARPA EARS project PI meeting, Berkeley, 2003-01-22): Slides I contributed to the Novel Approaches (Rainbow) team presentation at the half-year meeting of PIs on the DARPA EARS project.
Sound, mixtures, and learning
(NCARAI Seminar, Naval Research Labs, Washington DC, 2002-11-04): Latest version of my talk on recognizing separate components in sound mixtures.
The Quest for Ground Truth in Musical Artist Similarity
(International Conference on Music Information Retrieval ISMIR-02, Paris, 2002-10-16): Describes the work we did on attempting to define a single matrix of similarities between 400 different pop music artists, including our survey website (musicseer.com) which attracted over 20,000 similarity judgments.
Recognition and Organization of Speech and Audio
(Advent research overview, Columbia, 2002-09-20): A brief (20 minute) overview of the group and its research, including new slides on mutual information in speech, and the playola music recommendation website.
Meeting Recorder: Audio Processing
(M4 project meeting, IDIAP, Martigny, Switzerland, 2002-08-29): The MultiModal Meeting Manager (M4) project is a European-funded project led by Steve Renals at Sheffield University. This was a general update meeting, 6 months into the project. I presented some of the audio processing and tools that have been developed at Columbia, between this project and the NSF "Mapping Meetings" project I am also involved in.
Sound, Mixtures & Learning
(Invited talk, AFOSR Workshop on Computational Audition, Ohio State University, 2002-08-10): This was a small, specialized workshop on advanced sound processing and perceptual models; after introducing CASA, this talk proposes `sound fragment recognition' (i.e. missing-data recognition plus search across segregations) as an alternative to the signal-separation approach to CASA. The alarm recognition project is described as an example.
Audio Information Extraction
(At the Spoken Language Systems group, MIT, 2002-04-23): Yet another general overview of LabROSA and its work; this version included more slides about the musical artist similarity work.
Sound, Mixtures and Learning
(Invited talk at the Learning Workshop, Snowbird, Utah, 2002-04-04): Introduction to auditory scene analysis and its computational modeling, to speech recognition in noisy backgrounds, then some ideas about how to analyze general sound mixtures with the same techniques, and where to employ the machine learning ideas that are the theme of this yearly workshop.
Audio Information Extraction
(At the Center for Language and Speech Processing, Johns Hopkins University, Baltimore MD, 2002-03-28): General overview of LabROSA and recent projects, as the guest of Sanjeev Khudanpur and Fred Jelinek.
General Soundtrack Analysis
(At the Foreign Broadcast Information Service, Reston VA, 2002-03-12): This presentation was part of the kickoff of a 'virtual academic team' to help this government open source information monitoring service keep up with the increasing volumne of multimedia material.
Audio Information Extraction
(At Mitsubishi Electric Research Laboratories, Murray Hill NJ, 2002-01-07): Newest version of my 'getting to know me and LabROSA' talk, where I overview our research perspective and current and future projects.
The shorter sound examples referenced in the PDF are in this zip file, and should be unpacked into the same directory as the slides.
Mapping meetings: Columbia's plans
(At the Meetings Retreat, Port Ludlow WA, 2001-10-13): Presentation introducing Columbia to the other partners in the Mapping Meetings project, including ideas about what we plan to do.
LabROSA Research Overview
(At the Columbia EE Signal Processing group Faculty Research Overview Seminar, 2001-09-28): 15 minute talk introducing the main ideas behind the lab, and a few current projects, addressed to interested graduate students.
Recognition and Organization of Speech and Audio
(hosted by Douglas Repetto at Columbia's Computer Music Center, 2001-09-18): Basically a re-run of the slides I showed at NEC, but a rather different talk for this computer music class.
Recognition and Organization of Speech and Audio
(hosted by Brian Whitman at NEC Research Institute, Princeton NJ, 2001-08-16): Latest version of my "get to know me" talk with more emphasis on the non-speech-recognition aspects of LabROSA's work.
Computational Models of Auditory Organization
(At the EU Advanced Course in Computational Neuroscience, Abdus Salam Center for Theoretical Physics, Trieste, Italy, 2001-08-09)
Also available as a ZIP file including all the linked sound examples.: A talk on high level auditory perception and efforts to model it. The audience were experts in neural system modeling, but the talk barely touches that field - a case of "and now for something completely different..."
Recognition and Organization of Speech and Audio
(hosted by Jont Allen at AT&T Shannon Labs, Florham Park NJ, 2001-06-21): Updated version of talk introducing my new lab at Columbia and some of the current and planned projects. Includes slides on new projects such as multisource decoding, lyrics recognition and acoustic detection of meeting participant motion.
Tandem modeling investigations
(at the Respite project meeting, Saillon, Switzerland, 2001-01-25): Brief slide pack updating my colleagues on the Respite project of recent work by myself and collaborators on Tandem acoustic modeling.
Recognition and Organization of Speech and Audio
(at Chin Lee's group, Lucent, Murray Hill, 2001-01-12): A slightly improved version of my LabROSA introductory talk, including detail on the Tandem modeling approach. (I reused these slides for a talk at IDIAP in Martigny, Switzerland on 2001-02-01.)
Recognition and Organization of Speech and Audio
(at the Columbia Univ. EE dept., 2000-10-13): This is a combined tutorial and seminar on the speech and audio processing themes that I will be researching in my new lab, dubbed LabROSA.
RESPITE: Tandem & multistream research
(at the Sphear/RESPITE research workshop, Mons, Belgium, 2000-09-16): Review of work performed at ICSI (and OGI, CMU, Columbia...) in the preceding 6 months relevant to the RESPITE multistream speech recognition theme. Covers the latest experiments with Tandem modeling for the large-vocabulary SPINE task, as well as online normalization and foreign languages. Also mentions Barry Chen's work on multistream mixtures-of-experts, and Mike Shire's work on multicondition feature design.
Tandem Acoustic Modeling: Neural nets for mainstream ASR?
(at Sheffield University Speech & Hearing Group, 2000-06-20, then at ICSI Real lunch, 2000-06-27): A discussion of the 'Tandem modeling' approach (feeding neural network outputs as features into HTK to do better than either approach alone). This is based on my ICASSP-2000 poster on the same topic, but has some new figures, partly in response to comments received during the poster session.
Improved recognition by combining different features and different systems
(at AVIOS 2000, San Jose, 2000-05-24): This was meant to be a relatively general-interest talk on the various ways that speech recognition can be improved by combining different approaches to the same problems. AVIOS (the American Voice Input Output Society) is a very applications-focused conference.
Content-based analysis and indexing for speech, sound & multimedia
(at the ICSI Real Lunch meeting, 2000-04-04): I hadn't given a talk about my own work to my own group for a long time, so this was meant as an overview of the things I have been thinking about for the past year or so, and the direction in which I plan to go. Specifically: applying information retrieval to multimedia content, particularly sound mixtures that are broken up into objects using computational auditory scene analysis.
Speech Interfaces
(at the Human Centered Computing retreat, UCB, 2000-02-24): John Canny of Berkeley CS has been organizing an initiative in Human Centered Computing -- roughly, the intersection of computer science, social sciences and design. This talk was to provide an overview of the state of speech recognition and some current projects at ICSI, emphasizing our highly collaborative nature. (n.b. it opens in full-screen mode - Ctl-L (or something like that) returns you to normal window view).
Sound Content Analysis
(at Shih-Fu Chang's group, Columbia University, 2000-02-08): I was visiting this group in at Columbia who are working on content-based indexing and retrieval based on image and video cues; it's an obvious match to my interest in audio content-based retrieval. This talk spanned speech recognition, auditory scene analysis, and my ideas for content-based analysis. (Be prepared to hit Ctl-L to get out of full-screen mode).
Jan 2000 European tour review
(at the ICSI real lunch meeting, 2000-02-01): Before you know it, I'm back to Europe, attending the final meeting of the Thisl project and the end-of-year-one meeting for the Respite project. These slides provide some overview and updating of these meetings as I presented to the rest of the home team; they are based on the slides I used at the meetings.
European tour review
(at the ICSI real lunch meeting, 1999oct05): This was my brief slide pack reviewing the parts that I found interesting at Eurospeech and the two EU project meetings, as well as updating my colleagues on what we will be doing in those projects.
Thisl update
(at the Thisl meeting, Les Marecottes, Switzerland, 1999sep20): My second meeting in Switzerland was a brief progress review of the Thisl project on spoken document retrieval. This is a very brief slide pack summarizing work at ICSI since the last meeting in June.
AURORA with a neural net etc.
(at the RESPITE/SPHEAR workhop, Les Marecottes, Switzerland, 1999sep13): This was a private workshop for participants in the two European projects being managed by Phil Green of Sheffield. I am involved in RESPITE, and this brief talk described the work I've recently been doing on addressing the AURORA noisy digits task with neural net acoustic models, as well as a couple of other multistream- related projects going on at ICSI.
An overview of speech recognition research at ICSI
(at the Tampere University of Technology, 1999sep02): My second talk at TUT gave a little background to ICSI and the realization group, a brief introduction to connectionist speech recognition, and a lightning tour of some research projects in speech recognition currently happening within the group.
CASA: Principles, practice & applications
(at the Tampere University of Technology, 1999sep01): As the guest of Anssi Klapuri and the Tampere International Center for Signal Processing, I spent a few days at TUT and gave a couple of talks. This one is intended as an introduction to auditory scene analysis, computational modeling thereof, and some applications - including some speculation about content-based retrieval for nonspeech audio.
ICSI/Thisl progress report
(at the Thisl meeting, Sheffield UK, 1999jun24): This brief report summarized the work at ICSI on the THISL project since the previous report in February. Specifically, the Thomson NLP parser was integrated into the GUI, we trained an MSG acoustic model on the BBC data, and I reported on some related projects and developments.
European projects update
(at ICSI Real Lunch Meeting, 1999feb11): On my return from the European trip described below, I gave a lunch talk describing the meetings, and what I and others had said at them.
Current work at ICSI
(BBC R&D, London, and ICP Grenoble, France, 1999feb03-08): I spent ten days in Europe attending meetings of the THISL and RESPITE projects - EU funded collaborations with European labs and ICSI - and another meeting to discuss a possible future project proposal involving many of the same partners. These slides were the ones I used when presenting our work at these meetings. I called it 'current work at ICSI', but of course it was a very limited subset, just the work related to those projects.
Broadcast News: Features & acoustic modelling
(SPRACH final review meeting, INESC Lisbon, Portugal, 1998dec15): The 3 year EU collaborative SPRACH project ended in December 1998; our final review meeting was mainly taken up with a description of the system we had collectively submitted to the Broadcast News evaluation. I was describing just ICSI's contribution in the acoustic modelling (modulation-filtered spectrogram features and very large multi-layer perceptron classifiers).
Some aspects of the ICSI 1998 Broadcast News effort
(Part of the BN overview Real Lunch, 1998nov25): After the crazy rush to fulfill our part in submitting a full LVCSR system to the 1998 NIST/DARPA Broadcast News evaluation, Morgan, Eric, Adam and I gave a lunch talk to the rest of the group to explain what all the fuss had been about. My part was about feature choice, large nets, and some preliminary work on whole-utterance filters (i.e. nonlinear segment normalization) and gender-dependence.
Speech Recognition at ICSI: Broadcast News and Beyond
(at Erv Hafter's Ear Club, UC Berkeley Psychology, 1998sep21): Erv Hafter runs a seminar series as part of his UCB psychoacoustics group which I agreed to address. In the event, it turned out to be a fairly general talk about the Broadcast News task, our efforts (in conjunction with our European partners) to field a system in this year's evaluations, and other aspects of speech recognition that I thought would interest hearing scientists.
Review of September SPRACH/Thisl meetings
(at the Realization Group Lunch Meeting, ICSI, 1998sep09): Morgan and I went to another pair of meetings in Cambridge, UK, for these two EU-funded projects. On return, I gave a brief review of what had been discussed and the projects' status at our lunch meeting, using these eight slides.
Auditory Scene Analysis: Phenomena, theories and computational models
(at the NATO Advanced Studies Institute on Computational Hearing, Il Ciocco, Italy, 1998jul11): NATO has a fund to support scientific meetings with a 'tutorial' aspect. My colleague Steve Greenberg organized this 12 day meeting on hearing which ranged from anatomy and physiology through to speech recognition and auditory organization. I gave a 90 minute talk on auditory scene analysis on the last full day.
SPRACH/ThisL review
(parts of the EC project review meetings, 1998mar24/25, Mons, Belgium): Projects funded by the European Commission `Framework' program have annual progress review meetings with external reviewers. We are currently subcontractors on two, SPRACH and ThisL, and we had back-to-back reviews for them. These are the few slides I contributed to each day's proceedings covering aspects of the work done at ICSI under these grants, and the single slide I used to summarize the ThisL project when making a trip report to the rest of our group on return.
ICSI Speech Technology
(at Randy Katz's group meeting, UCB CS dept, 1998feb26): This group on campus is interested in using speech recognition in some demo applications for their work in scalable and mobile networking. I presented this one-page summary of what we do at ICSI and the tools we could share with them.
Visualization tools & demos and the ICSI Realization group
(ICSI Real Lunch, 1998feb12): One of my projects since being at ICSI has been to encourage and support the proliferation of accessible demos of the research we do. To this end, I've developed a number of specific visualization tools within a Tcl/Tk + extensions framework. This talk served to publicize these tools, and to share my vision of "a demo on every desk".
Automatic audio analysis for content description & indexing
(MPEG-7 Symposium, San Jose, 1998feb04): MPEG-7 is to be a new standard for the description and indexing of the content of multimedia 'assets' such as video and audio. I was invited to talk about my work in computational auditory scene analysis as one approach to extracting the kind of information that the standard might want to cover. You can learn more about MPEGs 1, 2, 4 and 7 at the MPEG home page.
ICSI/ThisL status report
(IDIAP, Switzerland 1997dec11): The ThisL project (Thematic Indexing of Spoken Language) had an informal meeting. I went representing ICSI, and I gave a very brief presentation of some relavent work at ICSI: visualization/user-interface tools, recognizing in reverb by combining information at different time scales, and my speech-mixtures stuff.
Problems and future work for ASR-in-CASA
(Stanford Hearing Seminar 1997nov20 / Berkeley Ear Club 1997nov24): A replacement for the section 5 of the original Mohonk '97 talk, for the extended version I gave of that talk when I got back to California.
On the importance of illusions for artificial listeners
(Haskins 1997oct24 / NUWC Newport 1997oct25): Forgive the title trying to be cute. This pack (again in Acrobat PDF) comprise slides for two talks I gave while 'out east' for mohonk, basically just introducing my work. One talk was to Haskins Lab in New Haven - a bunch of very serious speech, hearing and language scientists who probably see this stuff as too applied, and the next was to a group of Navy sonar researchers who probably think this hearing modelling is extremely left-field, blue sky stuff. Most slides were the same in both talks, although the voice over differed!
Computational Auditory Scene Analysis exploiting Speech Recognizer knowledge
(Mohonk 1997oct22): This is the actual presentation I made at the 1997 IEEE Mohonk Audio Workshop. The slides are in Acrobat PDF format - I got sick of translating them to HTML; hope that's OK.
Exploiting ASR in CASA
(ICSI 1997may21): This was a lunchtime talk I gave to my colleagues at ICSI describing the paper I had submitted to IEEE WASPAA'97 on an idea for integrating a speech recognition engine into a computational auditory scene analysis system that is anticipating a mixture of speech and nonspeech sounds. The first big problem that came up was working the speech recognizer 'backwards' to recover an estimate of the speech spectrum from the recognized phoneme labels; the talk focuses mainly on this aspect.
Digital Audio
(Lego 1997may06): This was a talk I gave at a mini workshop on digital audio hosted by Lego (the plastic brick people) at their headquarters in Billund, Denmark. They are looking into future generations of computer-based toys, and brought together a collection of researchers from industry and academia to brainstorm about audio in toys. My talk was supposed to provide an introduction and framework, focusing on synthesis.
Divisive issues in Computational Auditory Scene Analysis
(Stanford Hearing Seminar, 1997mar06)
: This was a talk I gave at Malcolm Slaney's Stanford Hearing Seminar. It was intended to be a brief overview of research into computational models of auditory scene analysis, focusing on the distinctions between the different projects in this field.

Last updated: $Date: 2006/06/05 15:13:26 $
Dan Ellis <[email protected]>