Project Wish List

Last Update: 03/20/2001 10:46:37 AM
<[email protected]>

EE6820 project page     EE6820 Home    xlx Audio 

Segmentation / Classification of Soccer Field Audio



[1]    Albert S. Bregman " Auditory scene analysis: hearing in complex environments", Thinking in sound: the cognitive psychology of human audition, Oxford University Press, 1993, p10~36 
      <  see reading summary Feb 6    >

[2]    John Saunders "Real-time discrimination of broadcast speech/music", ICASSP 96
        < see reading summary Feb 8    >

[3]     Dellaert, F.; Polzin, T.; Waibel, A. "Recognizing emotion in speech" , ICSLP 1996

[4]    Droppo, J, Acero, A, "Maximum A Posterior Pitch Tracking", ICSLP 1998
< see reading summary Feb 26    >

[5]    Arons, Barry, "Speechskimmer: A System for Interactively Skimming Recorded Speech", ACM CHI 1997

[6]    Chao Wang; Seneff, S., "Robust pitch tracking for prosodic modeling in telephone speech", ICASSP 2000

[7]    Johnathan Foote, "Visualizing music and audio using self-similarity", Proc. ACM Multimedia 1999  
        (This paper may seem irrelevant to the topic, yet it's still interesting to read)
        < see reading summary Feb 26    >

[8]    Yong Rui, Annop Gupta, Alex Acero "Automatically extracting highlights from TV baseball programs", Proc. ACM Multimedia 2000 

[9]    Eric Scheirer, Malcolm Slaney, "Construction and evaluation of  a robust multi-feature speech/music discriminator", ICASSP 97
        <  see reading summary week4    >

[10]   Ellis, D., & Williams, G., "Speech/music discrimination based on posterior probability features", Proc. Eurospeech-99, Budapest

[11]   Carey, M.J.; Parris, E.S.; Lloyd-Thomas, H. "A comparison of features for speech, music discrimination", ICASSP-99

[12]   Ellis, D., "Hard problems in computational auditory scene analysis",

1. Video syntax analysis via audio cues

Type 1    audio-visual information centric, e.g. movie
                try to segment consistent chunk of audio data (speech/music) to form a complete video skim
                specific points of interest: silence detection, music/speech classification

Type 2    video info centric, e.g. soccer video
                3 kinds of audio: acclamation,  whistle, commentary
                the presence of the first 2 kinds of events are usually clues of important happening or transition points in the game;
                the change of the commentaries (pitch and speed change, narrator stop) are also useful.
                Problem 1: how to segment theses 3 types of sound?

2.Music watermarking 

3.Constrained Music Analysis and synthesis