| With the rapid development of multimedia and Internet technology,there are more and more ways to get pictures of natural scenes.It is becoming more and more important to pick up the information needed from the abundant pictures.With the development of computer vision technology and deep learning technology,the demand for efficient reading of text in natural scenes has also increased dramatically.There are more and more applications of text localization and recognition in natural scenes.Such as real time translation of pictures and words,automatic indexing of video or images,intelligent transportation system,blind man navigation,robot navigation system,automatic location information service,industrial automation and so on.The content of this thesis is the detection and recognition of the text in the natural scene.Finally,an end to end system will be implemented to locate and recognize the text in the natural scene image.The text types identified in this thesis are English and numeric,while other texts,such as Chinese,are not in the research scope of this article.Text detection and recognition in natural scenes includes two main steps,text area location and text recognition.Based on the research and summary of the excellent algorithm strategies at home and abroad,this thesis studied the two prats deeply and realized an end-to-end system combining the two steps together.The main contents are as follows:(1)Applying the methods of detection and location of general objects to text location and realizing general algorithms to extract text regions from the complex scenes of natural scenes.The background of natural scenes images is very complex and some images may contain many other objects except text.In some images,text regions may merged tightly in the background,and in other images text may be randomly distributed.All of these will have a great impact on the detecting and locating of text regions.One object of this thesis is to find a general algorithm to detect and locate text regions from the complex scenes.In order to solve this problem,this thesis modified and retrained the methods Faster RCNN and Mask RCNN,which were originally applied to general objects location and classification,and has achieved good results in location accuracy and operation time on the natural scenes text regions location problem.This part is also a new point of this thesis.(2)For the complex and diverse texts extracted from natural scenes,researching general methods to recognize them by as few preprocess operations as possible.Some characters in the natural scene text may adhere each other seriously,some characters may be in very complex fonts,and some characters are very hard to recognize for highly exposed or other noises.All of these result in that we cannot find a general method to preprocess the denoising and segmentation.Generality is reflected in the fact that no special preprossing is performed on the picture.Therefore,the other content of this thesis is to find general methods to effectively recognize the text extracted from the natural scenes using as little preprocessing as possible.To reach this point,this thesis designed a method based on CNN,RNN and CTC and a method based on CNN,RNN and Attention mechanism.After processing simple grayscale and dimensional normalization of natural scene text,two kinds of neural networks based on the principle of integral recognition(opposite to recognition of individual character cut from whole text)are designed and realized.Compared with the open source methods proposed by Jaderberg,we found that our recognition methods perform well in accuracy rate and recognition time. |