Event Detection in Baseball Video Using Superimposed Caption Recognition

Dongqing Zhang, Shih-Fu Chang


We have developed a novel system for baseball video event detection and summarization using superimposed caption text detection and recognition. The system detects different types of semantic level events in baseball video including scoring and last pitch of each batter. The system has two components: event detection and event boundary detection. Event detection is realized by change detection and recognition of game stat texts (such as text information showing in score box). Event boundary detection is achieved using our previously developed algorithm, which detects the pitch view as the event beginning and non-active view as potential endings of the event. One unique contribution of the system is its capability to accurately detect the semantic level events by combining video text recognition with camera view recognition. Another unique feature is the real-time processing speed by taking advantage of compressed-domain approaches in part of the algorithms such as caption detection. To the best of our knowledge, this is the first system achieving accurate detection of multiple types of high-level semantic events in baseball videos.