Font Size: a A A

Learning-Based Text Extraction In Natural Background

Posted on:2008-08-08Degree:MasterType:Thesis
Country:ChinaCandidate:R J JiangFull Text:PDF
GTID:2178360212976072Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Text recognition in natural scenes is very helpful to many important applications. Because of the complex background and various character appearances, the application is impeded by the shortage of the technology of localization and segmentation. After many researches, the thesis represents a novel algorithm to extract text from natural scenes, which is based on machine learning. First, the algorithm decomposes input image into multiple CCs by NLNiblack, including text CCs and non-text CCs. To localize and segment text from background, our purpose is to preserve text CCs while discard non-text CCs. 17 text features are proposed to discriminate texts from non-texts. Then, all CCs are verified by a 2-stage classification model composed by a cascade classifier and an SVM. The cascade consists of 17 weak classifiers, each concentrating on one feature. The first weak classifier is fed all CCs. If the CC is considered as non-text, it will be rejected immediately; else, it will be input into next weak classifier. Each classifier is working in this way until the end of the cascade. Most of the non-text CCs are filtered by cascade, and the SVM does further verification to get more precise result. The final output is binary image with text only. The combination of weak classifier strong classifier guarantees the efficiency and effectiveness of the algorithm. The thesis proposes a pixel-wise criterion to evaluate algorithm on testing set. The testing result shows a satisfactory performance of the method.
Keywords/Search Tags:Text Extraction, Text Localization, Text Segmentation, Text Features, 2-Stage Classification, Cascade Classifier
PDF Full Text Request
Related items