The DVMM Lab at Columbia University is dedicated to research of multimedia content analysis, retrieval and communication. We are particularly interested in answering questions such as what information can be extracted by computers from images, videos and multimodal data, how such information can be harnessed to build large-scale search engines and recognition systems, and how to achieve optimal system performance in constrained environments such as mobile devices.

To reach these goals, we tackle related problems in computer vision, machine learning, signal processing and information retrieval with focus in the following areas.

  • multimedia search and retrieval
  • machine learning and computer vision
  • mobile communication and applications
  • media security and forensics
  • benchmarking and standard

We have applied research to several domains, with close collaboration with colleagues in medicine, journalism, education, as well as industry. Some highlight results include automatic indexing systems for large collection of images and videos in sports, news, open source web, surveillance, and biomedical applications. In most cases, we explore novel use of techniques from machine learning, computer vision, and multimedia content analysis in order to extract patterns, semantics, and knowledge from multimedia content collection.

Our recent research focuses on two fundamental challenges – semantic gap in visual retrieval and Web scale content processing. To tackle the former issue, we co-led an effort to develop the first large multimedia concept ontology, called LSCOM, which includes more than 1000 concepts related to scenes, objects, people, and activities. To help stimulate research in this area, we developed and released a machine learning toolbox (called Columbia374) for detecting 374 semantic concepts in videos. We demonstrated the top accuracy in detecting semantic concepts (2008) and high-level multimedia events (2010) in the international video retrieval evaluation forum organized by NIST, called TRECVID. To address the Web-scale challenge, we have developed a series of theories and algorithms to significantly advance the performance of nearest neighbor search over large data sets at the scale of millions or more by exploring novel ideas of optimized locality sensitive hashing, semi-supervised hashing, and anchor graph based hashing. Using the advanced hashing techniques, we have recently demonstrated a fully operational mobile visual search system that can search 1 million product images over 3G networks within just 2-3 seconds. In another large-scale evaluation (2011), called DARPA Shredder Grand Challenge, we applied novel image search techniques to develop a system that can reassemble documents from thousands of shredded pieces and achieved a performance ranked within the top five places among nearly 9,000 participating teams. For more details of the research projects, please see the DVMM projects page.



For problems or questions regarding this web site contact The Web Master.
Last updated: June 12, 2003.