Constrained Utility Maximization for Generating Visual Skims

Hari Sundaram and Shih-Fu Chang

Abstract

In this paper, we present a novel algorithm to generate visual skims, that do not contain audio, from computable scenes. Visual skims are useful for browsing digital libraries, and for on-demand summaries in set-top boxes. A computable scene is a chunk of data that exhibits consistencies with respect to chromaticity, lighting and sound. First, we define visual complexity of a shot to be its Kolmogorov complexity. Then, we conduct experiments that help us map the complexity of a shot into the minimum time required for its comprehension. Second, we analyze the grammar of the film language, since it makes the shot sequence meaningful. We achieve a target skim time by minimizing a sequence utility function. It is subject to shot duration constraints, and penalty functions based on sequence rhythm, and information loss. This helps us determine individual shot durations as well as the shots to drop. Our user studies show good results on skims with compression rates up to 80%.