INDetector – Columbia’s Image Near-Duplicate Detection Software


INDetector is a software program for detecting near-duplicate images for image and video retrieval applications. The algorithm implemented in the software uses a machine learning based method to learn the parts-based similarity of the images. Images are represented as parts-based models, including Attributed Relational Graph (ARG) and Bag-of-Parts (BoP).

The package contains the following components :

         Parts detection and feature extraction to obtain ARG or BoP representations. Parts detection is realized by the Harris corner detector. The features of parts include spatial, color and texture features. Texture features are extracted with Gabor wavelet filters.


         Near-duplicate image detection learning. The learning component is used to learn the parameters of detection from annotated training data.


         Near-duplicate image detection. The detection component reads the parameter learned during the training phase and outputs decision scores for determining whether or not two images are near-duplicate.

The package contains the MATLAB and C++ source code for the above three components. The user can compile the programs under Windows or Linux using Visual C++ or gcc. In addition, the package also contains a data set for evaluating the near-duplicate image detection performance. The data set is extracted from the TRECVID 2003 benchmark data sets. Due to CopyRight issues, images are not included in the data set.

Publications related to the software include the following:

         Dong-Qing Zhang. Statistical Part-Based Models: “Theory and Applications in Image Similarity, Object Detection and Region Labelling.” PhD Thesis Graduate School of Arts and Sciences, Columbia University, 2005.


         Dong-Qing Zhang and Shih-Fu Chang, “Detecting Image Near-Duplicate by Stochastic Attributed Relational Graph Matching with Learning”, ACM conference of Multimedia 2004.


To download the package, please go to the download page. You need to provide your email address for registration. Before using the package, please carefully read the ReadMe and CopyRight files in the package.



For more detailed information about our image near-duplicate detection project, please visit our near-duplicate detection project web page.


For bug report, please send email to Dong-Qing Zhang. For technical and licensing questions, please send email to Prof. Shih-Fu Chang and Dong-Qing Zhang.


For problems or questions regarding this web site contact The Web Master.
Last updated: June 12, 2003.