| With the advent of the digital age,multimedia data,especially image data,is everywhere in people's daily lives.The types and quantities of information contained in image data are very considerable.The extraction of text information is very useful for the subsequent work of collecting,secondary processing and documenting effective information.Conventional image processing techniques have been able to properly handle standard scanned document images.However,in the natural scene,due to excessive interference factors,such as content complexity,data integrity,noise data,etc.,the extraction of text information in the image still has great challenges.In the recent decade,with the development of deep learning,researchers began to consider its application in computer vision and make great progress in image detection and recognition tasks.Based on this,in order to solve the detection and recognition of Chinese and English texts in natural scenes,this paper proposes a YOLOv3 and CRNN model system.In this paper,the text orientation detection algorithm is used to predict the 0,90,180,and 270 degrees text,and a small angle estimation function is used to predict other angles.The image is rotated by the predicted angle,and then the rotated image is adopted by the YOLOv3 network model.Perform text area detection to get a series of text boxes.This paper uses a text box clustering algorithm to achieve the combination of the detection text box and get the complete text line.By optimizing the YOLOv3 model training and evaluating it on the RCTW-17 test set,the generated AP value can reach 0.66,and the accuracy and recall rate are also significantly improved,and the expected evaluation results are obtained.In this paper,the detection results of YOLOv3 model and CTPN model are compared in terms of edge,image background light intensity,text size and tilt degree,text orientation,text arrangement direction and detection speed.The results showthat YOLOv3 model has more complete and accurate detection performance.In this paper,the CRNN model with better English recognition effect is used to realize the indefinite length recognition of Chinese and English texts,and the recognition accuracy of 97.85% is obtained on the verification set.Since the recognition result of the whole text line is a line without spaces,this paper uses the Viterbi algorithm to segment the English text strings,showing better readable effect.By comparing the detection and recognition results of YOLOv3 and CRNN models with CTPN and DenseNet models,the results show that CRNN model has better recognition effect in Chinese and English,while the DenseNet model has a good recognition effect on Chinese and poor recognition effect in English.At the same time,the same test scene text image is identified in the same environment.The average test time of the YOLOv3 and CRNN models is 0.4258 s,which is significantly better than the average test time(0.8250s)of the CTPN and DenseNet models. |