This is the homepage of the content based image similarity search and concept prediction tool for the DARPA MEMEX project developed at Columbia University. The objective of the MEMEX project is finding information over very large highly unstructured sources on the Web. One of the domain of application is fighting human trafficking crimes online. In this context, our objective is finding similar images based on visual content over a very large repository as large as 400 million images.

Columbia Image Similarity Search

The Columbia Image Similarity Search and Concept Prediction REST API Tool is able to return visually similar and duplicate images for any given image. Our tool makes use of deep learning based image features, namely the Sentibank concepts [1] which are Adjective-Noun Pairs (ANPs) with strong correlation to visual sentiment and emotions. Thanks to these semantic features, our content-based retrieval method is able to match human subjects based on hairstyle, physical characteristic, dress, environment and pose. In order to handle the extremely large image database and support fast search response, our group has also developed compact hashing techniques such as those described in [2] to index and match images by just a small number of hash bits.

MEMEX Columbia University Image Search Overview

Overview of the Columbia Image Similarity Search tool.


The image similarity service deployed in MEMEX project currently indexes around 400 million images of the human trafficking domain. It supports incremental update and the indexed images are up-to-date within a few hours. Querying a previously unseen image takes a few seconds, of which 1.5 seconds are used for computing the image feature and the rest for comparing with the millions of images using compact hashing representation.

Our image similarity search tool has been an important component of the DIG Search Engine. As we reported in [3], the system was first deployed "to six law enforcement agencies and several NGOs that are all using the system in various ways to fight human trafficking, such as by locating victims or researching organizations that are engaging in human trafficking. (...) Reports to date indicate that DIG tool has already been successfully used to identify several victims of human trafficking, but due to privacy concerns we cannot provide additional details." After the evaluation of this first prototype, an updated version of the DIG application has been deployed to more than 200 government agencies, and it is now actively used to fight human trafficking.


Our API tool has been widely used by different groups in the MEMEX project. Our team works closely with the DIG team from USC/ISI. The DIG Search Engine uses our API for content-based image search. Our tool has also been integrated into the Link HT system of Qadium and the ImageSpace system of JPL-Continuum-Kitware. Our tool can be found at Github [4] and are part of the DARPA Open Source Code Catalog [5].

DIG Search Engine

DIG Search Engine

Qadium Link HT

Link HT from Qadium

Teams and organizations that are using our tool:

DIG USC/ISI USC/ISI InferLink nextcentury
JPL Qadium continuum analytics Kitware

Publications and references

  • [1] D. Borth, R. Ji, T. Chen, T. Breuel and S.-F. Chang. "Large-scale Visual Sentiment Ontology and Detectors Using Adjective Noun Pairs," ACM Multimedia Conference, 2013.

  • [2] W. Liu, J. Wang, R. Ji, Y.-G. Jiang, and S.-F. Chang. "Supervised hashing with kernels." In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012.

  • [3] P. Szekely, C. Knoblock, J. Slepicka, A. Philpot, A. Singh, C. Yin, D Kapoor, P. Natarajan, D. Marcu, K. Knight, D. Stallard, S. S. Karunamoorthy, R Bojanapalli, S. Minton, B. Amanatullah, T. Hughes, M. Tamayo, D. Flynt, R. Artiss, S.-F. Chang, T. Chen, G. Hiebel and L. Ferreira, "Building and Using a Knowledge Graph to Combat Human Trafficking," In International Conference on Semantic Web (ICSW), 2015. Best Paper, Application Track

  • [4] Columbia Image Search tool for MEMEX Github repository.

  • [5] DARPA Open Catalog website.

About Us

Our team is from the Digital Video and Multimedia (DVMM) Lab of Columbia University.

Team members:


This research is supported in part by the Defense Advanced Research Projects Agency (DARPA) and the Air Force Research Laboratory (AFRL) under contract number FA8750-14-C-0240. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of DARPA or the U.S. Government.