Efficient Alternative to Bag-of-words for Visual Recognition


We present an efficient alternative to the traditional vocabulary based on bag-of-visual words (BoW) used for visual classification tasks. Our representation is both conceptually and computationally superior to the bag-of-visual words: (1) We iteratively generate a Maximum Likelihood estimate of an image given a set of characteristic features in contrast to the BoW methods where an image is represented as a histogram of visual words, (2) We randomly sample a set of characteristic features instead of employing computation intensive clustering algorithms used during the vocabulary generation step of BoW methods. Our comparable performance to the state-of-the-art, on experiments over three challenging human action datasets and an equally challenging scene categorization dataset demonstrates the universal applicability of our method.

Method Summary and Results

that lack a dominant subject, such as land/seascapes, we crop or expand the


A subset of the data alongwith ground truth annotations are available here.

Relevant Publications

Subhabrata Bhattacharya, Rahul Sukthankar, Rong Jin, Mubarak Shah, "A Probabilistic Representation for Efficient Large Scale Visual Recognition Tasks", In Proc. of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Colorado Springs, USA, pp. 2593-2600, 2011.