Last Update: 03/20/2001 10:46:37 AM
<[email protected]>
EE6820 project page EE6820 Home xlx Audio
Soccer filed audio is usually noisy, and usually is the mixture of two or more basic types.
��
[1]
Albert S. Bregman " Auditory scene analysis: hearing
in complex environments", Thinking in sound: the cognitive
psychology of
human audition, Oxford University Press, 1993, p10~36
< see reading summary Feb
6 >
[2] John Saunders
"Real-time discrimination of broadcast
speech/music", ICASSP 96
< see reading summary Feb 8
>
[3] Dellaert, F.; Polzin, T.; Waibel, A. "Recognizing emotion in speech" , ICSLP 1996
[4] Droppo, J, Acero, A,
"Maximum A Posterior Pitch Tracking", ICSLP 1998
< see
reading summary Feb
26 >
[5] Arons, Barry, "Speechskimmer: A System for Interactively Skimming Recorded Speech", ACM CHI 1997
[6] Chao Wang; Seneff, S., "Robust pitch tracking for prosodic modeling in telephone speech", ICASSP 2000
[7] Johnathan
Foote, "Visualizing music and audio using self-similarity",
Proc. ACM Multimedia 1999
(This paper may seem irrelevant
to the topic, yet it's still interesting to read)
< see reading summary Feb
26 >
[8] Yong Rui, Annop Gupta, Alex Acero "Automatically extracting highlights from TV baseball programs", Proc. ACM Multimedia 2000
[9] Eric Scheirer, Malcolm Slaney, "Construction and evaluation of
a
robust multi-feature speech/music discriminator", ICASSP 97
< see reading summary
week4 >
[10] Ellis, D., & Williams, G., "Speech/music discrimination based on posterior probability features", Proc. Eurospeech-99, Budapest
[11] Carey, M.J.; Parris, E.S.; Lloyd-Thomas, H. "A comparison of features for speech, music discrimination", ICASSP-99
[12] Ellis, D., "Hard problems in computational auditory scene analysis", http://sound.media.mit.edu/~dpwe/writing/hard-probs.html
1. Video syntax analysis via audio cues
Type 1 audio-visual information centric, e.g. movie
try to segment consistent chunk of audio data (speech/music) to form a complete
video skim
specific points of interest: silence detection, music/speech classification
Type 2 video info centric, e.g. soccer video
3 kinds of audio: acclamation, whistle, commentary
the presence of the first 2 kinds of events are usually clues of important
happening or transition points in the game;
the change of the commentaries (pitch and speed change, narrator stop) are also
useful.
Problem 1: how to segment theses 3 types of sound?
2.Music watermarking
3.Constrained Music Analysis and synthesis