A collaboration with the DVMM Lab. My side focuses on identifying and describing salient events in the audio track of a video, with the goal of linking these events to objects identified in the visual domain. You can read about our current approach here.
This work utilizes the matching pursuit algorithm to select salient elements in audio. Local pairs of these elements are used to define events, and are compared to each other with the goal of identifying similar-sounding events.