| As a common medium in natural scene images,text contains rich semantic information.And accurate text detection is the first step in image understanding.With the rapid development of deep learning,text detection method based on object detection and text detection method based on semantic segmentation are becoming two mainstream methods.Compared with the limitation of text default boxes of the object detection method,the semantic segmentation method is more universal for arbitrary-shaped text detection,which predicts all pixels in the images.In this thesis,based on the semantic segmentation,we propose a text detection method based on multi-scale feature pyramid fusion,which can detect arbitrary-shaped text in natural scene images.The main contributions are as follows:1.Aiming at the problem of poor detection performance of close text in natural scene images,we propose a pyramid feature enhancement model.Through top-down and bottom-up two paths,the information in different level feature maps is enhanced.Then the network can obtain more semantic information and location information,so as to better segment texts and improve the segmentation precision of the network.2.Aiming at the problem of inaccurate large-scale text detection results in natural scenes,we propose a multi-scale feature fusion network.The multi-channel network is used to extract features from different scale input images,and then we design two feature fusion methods,the method based on the depth of the feature map and based on the scale of the feature map,to fuse the extracted multi-scale features.So that the network can obtain more global information,thereby improving the performance of large-scale text detection.3.In view of the characteristics of arbitrary-shaped text lines,we design a text region generation algorithm based on pixel clustering.Based on the prediction results of the segmentation network,the text pixels in the image are clustered with the text kernel as the clustering center,so as to obtain the pixel sets of different text regions.It is not limited to the shape of texts in the image,we can obtain the arbitrary-shaped text lines.To show the effectiveness of our proposed text detection method,we conduct extensive experiments on two competitive benchmark datasets,CTW-1500 and ICDAR2015.Between the two datasets,CTW-1500 is explicitly designed for curve text detection,and on this dataset we surpass the baseline result by 6.3% precision,27.3% recall and 20.1% F-measure.ICDAR2015 is designed for multi-oriented text detection,and on this dataset we surpass the baseline result by 2.8% precision,1.4% recall and 1.9% F-measure.Compared with other mainstream text detection methods on the same dataset,our method achieves the highest precision of 85.0% and a comparable F-measure of 80.5% on CTW-1500 dataset,which verifying the effectiveness of our method for arbitrary-shaped text detection. |