The topic we are interested in is more or less related to a rather broad area of multimedia processing and analysis. So instead of trying to give a comprehensive list in the literature, we hereby present four sub topics with closely related approach or of specific interest.
Content-based search and retrieval

Content-based search and retrieval is a comprehensive topic. Its main focuses include extraction, representation and matching of spatial-temporal features; event detection and classification based on low-level features and domain knowledge; segmentation and abstraction at multiple levels (shot, scene, topic), and relevance feedback for user interaction.  The following include a few recent surveys in this field.

Sports video analysis

Sports video has inherent structure constraint as defined in rules of the game and field production. And we believe this structure makes it easier to explore the correlation and interaction of domain constraints, low-level features, and high-level semantics.
Prior works include domain-specific scene classification[4], classification of audio track into excited/unexcited commentary[6], audio event spotting via template matching[5][6], interactive browsing via object tracking[7], slow-motion detection by still-frame identification in MPEG stream[9], and incorporation of field model and object tracking[8].

Probabilistic content analysis

Probabilistic reasoning enable inference based on computable audio-visual features and domain-specific knowledges. Related works include using Beyesian network to identify multimedia objects and infer multimedia concepts[11, 12], and exploiting dynamic programming techniques or hidden Markov model to distinguish different types of TV program[13].

Front End

Effective and accurate segmentation and feature extraction is crucial to the performance of the whole content analysis system. Temporal segmentation output usually include shots, scenes, or other meaningful units. Video data can be further processed to segment video into regions or objects from which useful spatio-temporal features (such as trajectory, motion) can be extracted.

