EE6820 Project:
Segmentation / Classification of Soccer Field
Audio (1)
Soccer: MPEG-1 system stream, 1.5~1.6 Mb/sec,
CBR
Video -- 352*240, 30 frame/second
Audio -- sampling rate 44100Hz, Stereo (down-sampled to 16Kb/s mono before
processing)
| No. | Name | Source | Total Length | Language | Content description | Sound Characteristics |
| 1 | Costa | FOX Sports | 01:29:00 | English | Last 75 minutes of 2002 U-Champion Latin America Qualify, Costa Rica vs Guatemala, 8:1 | Clear Commentary, weak crowd noise |
| 2 | Argentina | FOX Sports | 01:29:40 | English | Last 75 minutes of Football Agentino, Los Andes vs. River Plate, 2:0 | Clear Commentary, moderate crowd noise, band-limited to ~5.5KHz (original production or broadcast settings) |
| 3 | News2 | MPEG-7 | 00:15:00 | Spanish | Part of a news program�� | Clear commentary strong crowd noise |
| 4 | Korea | MPEG-7 | 00:54:53 | Korean | cannot understand (plays are usually short, the teams seems pretty rusty though) |
Commentary: heavy utterance Crowd: very noisy and excited, with drums and shouting |
Looking for hints of useful features
intuitively from the waveform and spectrogram below. Things
worth trying out:
(a) Time domain
Amplitude information --- total
energy in a short time-window; mean and variance of amplitude;
Zero-crossing rate ---- and its
1st~3rd order moments.
... ...
(b) Frequency domain
Subband
energy in spectrum (03/31/01)
Subband energy distribution along
frequency axis (frequency "discreteness" for distinguishing formant
structure)
(c) Cepstrum and more complicated features
MFCC, features incorporating auditory
model, ...
(d) Posteriors coming out of a speech recognizer
How this would perform under
noisy environment of different level, and how this would perform with an unknown
language ...
(e) Try to find formant structure using pitch tracking.
Useful if excited/unexcited
commentator classification is desired. Overall very complicated, reported
accuracy ~75% (49 out of 66).
This may get more confused by some
pseudo-formant structure in crowd noise (see Figure 2).
Figure 1. Wave form and
Spectrogram of Different Soccer Field Audio
The darker the spectrogram, the larger the amplitude.
Click on graph to see full resolution.
| Costa | Argentina |
�� |
![]() |
| News2 | Korea |
��![]() |
![]() |
Figure
2. Formant-like peaks in crowd noise
(from
Wavesurfer
screen dump)
Both of the segments have speech in the beginning and crow noise later on
After all, crowd noise mainly consists of multiple human vocal sound, will this
confuse pitch tracker?
| News2 | ![]() |
| Costa | ![]() |
Soccer
project (1) (2)
| EE6820
Home EE6820
project page xlx
Audio
Last Update: 04/01/2001 03:52:37 PM
<[email protected]>