Localization of actions, events, and activities in long, untrimmed videos.
Real applications usually involve long, untrimmed videos, which can have highly unconstrained background scenes or irrelevant activities, and one video can contain multiple instances. Localizing actions and activities in long videos can save tremendous time and computational costs.
(1) Determine whether a video contains specific actions or activities (such as diving, jump, etc.).
(2) Identify temporal boundaries (start time and end time) of each action or activity instance.