%0 Conference Proceedings %F NIST-TRECVID11:IBM-DVMM %A Cao, L. %A Chang, S.-F. %A Codella, N. %A Cotton, C. %A Ellis, D. %A Gong, L. %A Hill, M. %A Hua, G. %A Kender, J. %A Merler, M. %A Mu, Y. %A Natsev, A. %A Smith, J. R. %T IBM Research and Columbia University TRECVID-2011 Multimedia Event Detection (MED) System %B NIST TRECVID Workshop %C Gaithersburg, MD %X The IBM Research/Columbia team investigated a novel range of low-level and high-level features and their combination for the TRECVID Multimedia Event Detection (MED) task. We submitted four runs exploring various methods of extraction, modeling and fusing of low-level features and hundreds of high-level semantic concepts. Our Run 1 developed event detection models utilizing Support Vector Machines (SVMs) trained from a large number of low-level features and was interesting in establishing the baseline performance for visual features from static video frames. Run 2 trained SVMs from classification scores generated by 780 visual, 113 action and 56 audio high-level semantic classi.ers and explored various temporal aggregation techniques. Run 2 was interesting in assessing performance based on different kinds of high-level semantic information. Run 3 fused the lowand high-level feature information and was interesting in providing insight into the complementarity of this information for detecting events. Run 4 fused all of these methods and explored a novel Scene Alignment Model (SAM) algorithm that utilized temporal information discretized by scene changes in the video %U http://www.ee.columbia.edu/ln/dvmm/publications/11/ibm-columbia-trecvid-med11.pdf %8 December %D 2011