| Natural scene text detection,as a kind of target detection,aims to locate text in natural images and plays an important role in various text reading systems.There are many methods for natural scene text detection,which can be roughly divided into traditional methods and deep learning methods.Most traditional methods require manual design of complex feature extraction algorithms and are composed of multiple steps.These methods are usually difficult to optimize and the algorithm is inefficient.Based on the deep learning method,the convolutional neural network is used to actively learn and extract the features of the text region,avoiding the complicated artificial character extraction algorithm.At the same time,it has simple steps,can be end-to-end training,and is easy to adjust and optimize.Compared to traditional optical character recognition(OCR),the natural scene text has greater detection difficulty due to the complex background and strong interference.At the same time,influenced by factors such as shooting angle and illumination,the text in the natural scene is not always positive and horizontal,but presents arbitrary directionality,perspective,distortion and so on.The existing natural scene text detection algorithm can only detect horizontal text or approximate horizontal text,and the detection effect on oblique text is very bad.Inspired by the general target detection algorithm Faster R-CNN and SSD,this paper makes some special designs for natural scene direction text detection.A full convolutional neural network for end-to-end training is proposed,which only contains convolution layer,Pooling layer and non-maximum suppression layers..In order to better match the text area in any direction in the natural scene,two representations of the rotating rectangle and the quadrilateral frame are designed to replace the horizontal rectangle default box representation in the Faster R-CNN and SSD.The preset default box does not use the method of manually setting aspect ratio in Faster R-CNN and SSD.Instead,it uses the prior knowledge of the real text box in the dataset to cluster the aspect ratio of the real text box in the dataset.The default text box obtained by clustering can better cover the text area in the natural scene,and reduce the number of default text boxes at each position of the feature map,thereby improving the calculation efficiency and speed of the algorithm.For the representation of the rotating rectangle and the representation of the quadrilateral box,we have designed different IOU calculation methods and matching algorithms.Specifically,the triangulation idea is used to divide the intersection area of the rotating rectangle into triangles for calculation.For the quadrilateral box,the minimum circumscribing horizontal rectangle is used to calculate the IOU value,which avoids the calculation problem of the irregular intersecting area of the quadrilateral frame and reduces the complexity of the algorithm.In this paper,the text box is predicted on multiple feature layers,which improves the detection ability of the algorithm for different scale texts in natural scene images.The proposed algorithm was tested on the ICDAR2015 Incidental Text benchmark dataset to evaluate its performance.Among them,the rotating rectangular frame version achieved 73.8% accuracy and 0.764 F-measure value,and the quadrilateral version achieved 77.1% accuracy and 0.777 F-measure value.In order to verify the detection effect of the algorithm on the level text in natural scene,it was also tested on the ICDAR2013 data set.The test results show that the algorithm of this paper not only can effectively detect the text in any direction of the natural scene,but also is suitable for detecting the horizontal text of the natural scene,achieving a good balance between accuracy and speed. |