GENERAL AND DOMAIN-SPECIFIC TECHNIQUES FOR DETECTING AND RECOGNIZING SUPERIMPOSED TEXT IN VIDEO
Dongqing Zhang, Raj Kumar Rajendran, and Shih-Fu Chang
We have developed generic and domain-specific video algorithms for caption text extraction and recognition in digital video. Our system includes several unique features: for caption box location, we combine the compressed-domain features derived from DCT coefficients and motion vectors. Long-term temporal consistency is employed to enhance localization performance. For character segmentation, we use a single-pass threshold free approach combining classification and projection to address noisy segmentation, text intensity variation, and algorithm complexity. In recognition, we use Zernike moments to achieve more accurate recognition performance. Finally, domain knowledge is explored and a statistical transition graph model is used to enhance recognition of domain-specific characters, such as ball counts and game score of baseball videos. The algorithms achieved real-time speed and significantly improved recognition accuracy. Furthermore, although the experiments were conducted in baseball videos only, these algorithms (except the transition model) are general and can be used in other applications, such as news and films.