Scene Text Detection Using Higher-Order Statistical Relational Model

Project's Home Page | Current Research Areas > Feature Extraction & Object Recognition >


Scene text is the text embedded in visual object or scene in an image or a video frame. Detecting scene text is much more challenging than detecting overlay text due to its large visual variations, such as color, shape. Previous methods deal with the problem using rule-based approaches, which require manual parameter tuning to maximize the detection performance. We propose a new principled method for scene text detection through learning a higher-order statistical relational model, which formulate the text detection problem as a probabilistic inference problem in the Higher-Order Markov Random Field ( (MRF with higher-order potential functions). In order to realize efficient detection, we have extended the conventional Loopy Belief Propagation in the pairwise Markov Random Field to a higher-order version that can be used in higher-order MRF.


Our method realizes scene-text detection by first segmenting an image into disjoint regions and then labeling the regions with a binary label (text or non-text). The probability of the label is the marginal probability of a Higher-Order MRF, which is used to capture the attributes of the regions and the relations among the regions. The Higher-Order MRF is established on the region adjacency graph as shown below. The parameters of the Higher-Order MRF can be learned from training data.

Scene text detection by a Higher-Order MRF and Loopy Belief Propagation

The intuition of using Higher-Order MRF rather than pairwise MRF is that text regions in a text line often form unique spatial-attributive patterns following certian higher-order relational rules. For example, three character regions shall form a straint line. These rules can be relaxed to their probabilitic versions, which can be encoded into the potential functions in the MRF (Shown Below). The traditional Loopy Belief Propagation developed for the pairwise MRF can be extended to the Higher-Order MRF following similar mathmatic derivations. Although, in theory any higher-order MRF can be converted to a pairwise MRF, Loopy Belief Propagation in the converted pairwise MRF in general has the convergence problem. Directly performing the Loopy Belief propagation in the Higher-Order MRF makes the LBP message passing more stable. This is confirmed by our experiments, where we did not observe serious convergence problems.



We compare the text detection performance using the Higher-Order MRF and the pairwise MRF. The testing data are from ICDAR 2003 data set. The following ROC curve shows that the Higher-Order MRF substantially outperforms the pairwise MRF model. An example of scene text detection is also shown below (right).



Dong-Qing Zhang

Prof. Shih-Fu Chang


Dong-Qing Zhang, Shih-Fu Chang. Learning to Detect Scene Text Using a Higher-order MRF with Belief Propagation. In IEEE Workshop on Learning in Computer Vision and Pattern Recognition, in conjunction with CVPR (LCVPR), Washington DC, June 2004.