Font Size: a A A

Research On Text Detection And Recognition In Natural Images

Posted on:2015-10-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:C YaoFull Text:PDF
GTID:1228330428965937Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
As an important carrier of human thoughts and emotions, text plays a crucial role in our daily lives and production. Text is almost ubiquitous, especially in modern urban environments. For example, posters, tags, name cards, license plates, guideposts and billboards, all contain text. Text in natural scenes directly conveys high level semantics, thus is a kind of key element to scene understanding. Automatic text detection and recognition have a wide range of applications in image retrieval, geo-locating, human-computer interaction, robot navigation, self-driving car and industrial automation.Conventional optical character recognition (OCR) techniques are specifically designed for document images, which are usually obtained using high precision scanner. The resolution of document images is quite high and the background is very clean. Characters in document images are in regular fonts and with onefold color. Therefore, character segmentation and recognition are relatively easy. In contrast, text detection and recognition in natural scenes are extremely challenging. On one hand, texts in natural scenes may be diverse. They can have different fonts, colors, scales, orientations and even different languages; on the other hand, various factors, such as low resolution, highlight, shadow, noise, blur and partial occlusion, all may make it difficult to detect and recognize texts in natural scenes.This thesis investigates the basic issues in scene text detection and recognition. The focus of this thesis is to explore novel, effective and robust representations and models, to cope with the challenges in scene text detection and recognition. Concretely, the research contents in this thesis are as follows.Firstly, we propose an approach to multi-oriented text detection. In reality, texts in natural scenes may be in arbitrary orientations, but most of the existing methods can only handle horizontal or near-horizontal texts. To remedy this limitation, we devise two sets of scale and rotation invariant features and a two-level classification scheme. Experiments on standard benchmarks demonstrate that the algorithm is able to detection texts of different directions and scales, while suppressing false alarms. To better evaluate our algorithm and compare it with other competing algorithms, we generate a new dataset, which includes various texts in diverse real-world scenarios; we also propose a performance evaluation protocol that is suitable for assessing algorithms for multi-oriented text detection.In order to construct an end-to-end system for detecting and reading texts of varying orientations, we take text detection and recognition as a whole and share the same features and classification structure for both tasks. Moreover, a new dictionary search based error correction method is also proposed, to improve the recognition accuracy. We generate a new image dataset to evaluate end-to-end systems for scene text recognition. Experiments on standard benchmarks confirm the excellent detection and recognition ability of the proposed end-to-end text recognition algorithm.To further improve the recognition rate, we propose a brand new multi-scale representation for scene text, also named as strokelets. This representation consists of a set of detectable primitives, which are automatically learned from training examples in an unsupervised manner. Strokelets allow accurate and robust character identification, bypassing character segmentation, which is very sensitive for natural images. Moreover, strokelets are robust to font variation, deformation, rotation, noise, blur and partial occlusion. Experiments show that the text recognition algorithm based on strokelets achieves state-of-the-art performance on several challenging datasets.
Keywords/Search Tags:Text Detection, Text Recognition, Natural Image, Arbitary Orientation, Multi-Scale, Representation, High Level Semantics, Image Understanding
PDF Full Text Request
Related items