| Object detection is an important branch of computer vision.Its purpose is to classify and locate objects in images.With the development of deep learning,the object detection algorithm based on Convolutional Neural Networks(CNN)is gradually emerging.Its effect is obviously improved compared with the traditional algorithm,but the huge amount of computation of convolutional neural networks makes it encounter some obstacles in practical application.On the other hand,the current work of text detection in natural scenes mainly focuses on Chinese characters and English characters,and there are few studies on Uyghur text detection.Based on the current situation,this paper uses the improved YOLO V3 algorithm to realize the accurate and fast detection of Uyghur in natural scene images,and the Uyghur text detection system in natural scene is designed and implemented.The main contents of this paper are as follows:(1)This paper optimizes the network structure of YOLO V3 algorithm,and further improves the detection accuracy of Uyghur text.In this paper,Dense Block is introduced to replace Res Block of YOLO V3 network structure,which can improve the detection accuracy through repeated use of image feature information.Inspired by the multi-scale prediction in the original network structure the problem of single receptive field is found.To solve the problem,the Feature Pyramid Networks(FPN)structure in YOLO V3 is replaced by Trident Block(Trident Block)which contains dilated convolution.The utilization of image features is improved by changing the field of receptivity and the structure of feature Pyramid Networks(FPN)in YOLO V3 is replaced by Trident Block(Trident Block).Combining the Trident Block to realize multi-scale prediction of the object in the image.Experiments show that the network structure designed in this paper further improves the accuracy of Uyghur text detection.(2)For networks of YOLO V3,the use of a large number of convolutional neural networks involves a large number of parameters and computational complexity.Enlightened by the lightweight CNN network structure MobileNet V2,which is widely used in mobile terminals,and based on a detailed understanding of the structure of YOLO V3,by referring to the structure of lightweight CNN network structure MobileNet V2,Depthwise Separable Convolution is introduced into the YOLO V3 to replace the 3 *3 ordinary convolution in the original network.By reducing the convolution parameters and the amount of calculation,the purpose of increasing speed is achieved.Combining with the idea of BN(Batch Normalization)fusion,the BN fusion of Depthwise Separable Convolution is carried out,which reduces the time of parameters passing through BN layer and further improves the detection speed.(3)The Uyghur text detection system in natural scene is designed and implemented. |