The interest of users in
organizing and accessing consumer videos is analyzed in this research.
We propose relevant Dimensions Of Interest (DOI) and models to predict
the user interest for each DOI using audio-visual features. In particular,
we distinguish between the importance of objects
(main character or entity), scenes (composition
or aggregate of objects) and events (action
or change in state of objects) in consumer videos from user's perspective.
We also present a taxonomy
of relevant concepts for each DOI tailored to consumer video domain.
Our contributions are backed with extensive data and a user study. Real
users were asked to score the importance of each DOI from 1 to 3, and,
to annotate video clips using the taxonomy or free text. The results
show high consistency (about 70%) and independence of the object, scene
and event scores confirming their suitability as basic DOIs. In addition,
these scores can be accurately predicted using simple models based on
heuristic rules and neural networks, which demonstrate the potential
of the DOIs for improving consumer video applications.
In the subjective study,
viewers are asked to mark the importance of different DOI dimensions
and assign semantic class labels to each consumer video clip based on
the proposed concept taxonomy. Click here
to see a snapshot of the user interface.