Jump to : Download | Abstract | Contact | BibTex reference | EndNote reference |


Hari Sundaram, Lexing Xie, Shih-Fu Chang. A Utility Framework for the Automatic Generation of Audio-Visual Skims. In ACM Multimedia, Juan Les Pins, France, December 2002.

Download [help]

Download paper: Adobe portable document (pdf)

Copyright notice:This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.


In this paper, we present a novel algorithm for generating audio-visual skims from computable scenes. Skims are useful for browsing digital libraries, and for on-demand summaries in set-top boxes. A computable scene is a chunk of data that exhibits consistencies with respect to chromaticity, lighting and sound. There are three key aspects to our approach: (a) visual complexity and grammar, (b) robust audio segmentation and (c) an utility model for skim generation. We define a measure of visual complexity of a shot, and map complexity to the minimum time for comprehending the shot. Then, we analyze the underlying visual grammar, since it makes the shot sequence meaningful. We segment the audio data into four classes, and then detect significant phrases in the speech segments. The utility functions are defined in terms of complexity and duration of the segment. The target skim is created using a general constrained utility maximization procedure that maximizes the information content and the coherence of the resulting skim. The objective function is constrained due to multimedia synchronization constraints, visual syntax and by penalty functions on audio and video segments. The user study results indicate that the optimal skims show statistically significant differences with other skims with compression rates up to 90\%


Hari Sundaram
Lexing Xie
Shih-Fu Chang

BibTex Reference

   Author = {Sundaram, Hari and Xie, Lexing and Chang, Shih-Fu},
   Title = {A Utility Framework for the Automatic Generation of Audio-Visual Skims},
   BookTitle = {ACM Multimedia},
   Address = {Juan Les Pins, France},
   Month = {December},
   Year = {2002}

EndNote Reference [help]

Get EndNote Reference (.ref)


For problems or questions regarding this web site contact The Web Master.

This document was translated automatically from BibTEX by bib2html (Copyright 2003 © Eric Marchand, INRIA, Vista Project).