Efficient Alternative to Bag-of-words for Visual Recognition


We present an efficient alternative to the traditional vocabulary based on bag-of-visual words (BoW) used for visual classification tasks. Our representation is both conceptually and computationally superior to the bag-of-visual words: (1) We iteratively generate a Maximum Likelihood estimate of an image given a set of characteristic features in contrast to the BoW methods where an image is represented as a histogram of visual words, (2) We randomly sample a set of characteristic features instead of employing computation intensive clustering algorithms used during the vocabulary generation step of BoW methods. Our comparable performance to the state-of-the-art, on experiments over three challenging human action datasets and an equally challenging scene categorization dataset demonstrates the universal applicability of our method.

Method Summary and Results

A subset of the data alongwith ground truth annotations are available here.

Relevant Publications

