Accurate Overlay Text Extraction for Digital Video Analysis
Dongqing Zhang and Shih-Fu Chang

Abstract

This report describes a system to detect and extract the overlay texts in digital video.
Different from the previous approaches, the system used a multiple hypothesis testing approach:
The region-of-interests (ROI) probably containing the overlay texts are decomposed into several
hypothetical binary images using color space partitioning; A grouping algorithm then is conducted
to group the identified character blocks into text lines in each binary image; If the layout of the
grouped text lines conforms to the verification rules, the bounding boxes of these grouped blocks
are output as the detected text regions. Finally, motion verification is used to reduce false alarms.
In order to achieve real time speed, ROI localization is realized using compressed domain features
including DCT coefficients and motion vectors in MPEG videos. The proposed method showed
impressive results with average recall 96.9% and precision 71.6% in testing on digital News
videos.