EE6820 Project: Segmentation / Classification of Soccer Field Audio (1)

                                Dataset and General Observations

1.    Dataset Description

Soccer: MPEG-1 system stream, 1.5~1.6 Mb/sec, CBR
              Video -- 352*240, 30 frame/second
              Audio --  sampling rate 44100Hz, Stereo (down-sampled to 16Kb/s mono before processing)

 No.  Name Source Total Length Language Content description Sound Characteristics
1 Costa FOX Sports 01:29:00 English Last 75 minutes of 2002 U-Champion Latin America Qualify, Costa Rica vs Guatemala, 8:1 Clear Commentary, weak crowd noise
2 Argentina FOX Sports 01:29:40 English Last 75 minutes of Football Agentino, Los Andes vs. River Plate, 2:0 Clear Commentary, moderate crowd noise, band-limited to ~5.5KHz  (original production or broadcast settings)
3 News2 MPEG-7 00:15:00 Spanish Part of a news program�� Clear commentary 
strong crowd noise
4 Korea MPEG-7 00:54:53 Korean cannot understand 
(plays are usually short, the teams seems pretty rusty though)
Commentary: heavy utterance
Crowd: very noisy and excited, with drums and shouting

2.    Observations

Looking for hints of useful features intuitively from the waveform and spectrogram below. Things worth trying out:
(a)    Time domain
        Amplitude information --- total energy in a short time-window; mean and variance of amplitude; 
        Zero-crossing rate ---- and its 1st~3rd order moments.
        ... ...
(b)    Frequency domain
        Subband energy in spectrum (03/31/01)
        Subband energy distribution along frequency axis (frequency "discreteness" for distinguishing formant structure) 
(c)    Cepstrum and more complicated features
        MFCC, features incorporating auditory model, ...
(d)    Posteriors coming out of a speech recognizer 
         How this would perform under noisy environment of different level, and how this would perform with an unknown language ...
(e)    Try to find formant structure using pitch tracking.
        Useful if excited/unexcited commentator classification is desired. Overall very complicated, reported accuracy ~75% (49 out of 66). 
        This may get more confused by some pseudo-formant structure in crowd noise (see Figure 2). 

back to top


Figure 1. Wave form and Spectrogram of Different Soccer Field Audio
   
The darker the spectrogram, the larger the amplitude. Click on graph to see full resolution.

Costa Argentina
��
News2 Korea
��

back to top


Figure 2. Formant-like peaks in crowd noise
(from Wavesurfer screen dump)
Both of the segments have speech in the beginning and crow noise later on
After all, crowd noise mainly consists of multiple human vocal sound, will this confuse pitch tracker?

News2
Costa

back to top


Soccer project   (1)    (2)  |  EE6820 Home    EE6820 project page    xlx Audio 
Last Update: 04/01/2001 03:52:37 PM
<[email protected]>