Accurate Overlay Text Extraction for Digital Video Analysis
Dongqing
Zhang and Shih-Fu Chang
Abstract
This report describes a system to detect and extract the overlay
texts in digital video.
Different from the previous approaches, the system used a multiple hypothesis testing
approach:
The region-of-interests (ROI) probably containing the overlay texts are decomposed into
several
hypothetical binary images using color space partitioning; A grouping algorithm then is
conducted
to group the identified character blocks into text lines in each binary image; If the
layout of the
grouped text lines conforms to the verification rules, the bounding boxes of these grouped
blocks
are output as the detected text regions. Finally, motion verification is used to reduce
false alarms.
In order to achieve real time speed, ROI localization is realized using compressed domain
features
including DCT coefficients and motion vectors in MPEG videos. The proposed method showed
impressive results with average recall 96.9% and precision 71.6% in testing on digital
News
videos.