As a common and important sign in today’s life,street signs provide essential traffic guidance information for pedestrians to travel.With the rapid development of deep learning technology in recent years,it has become possible to rely on machine vision to provide text information of street signs for people with visual impairment and autonomous driving,etc.Although the text recognition technology based on deep learning is becoming more and more mature,text recognition on natural scene images still faces great challenges.The complex and diverse backgrounds of street sign images in natural scenes and the influence of illumination make it difficult to make full use of image features in the process of text detection;the variety of characters in street sign text images and the similarity of features seriously affect the accuracy of text recognition.To address the above problems,this paper investigates two main aspects of street sign text detection and recognition,aiming to propose an accurate and efficient end-to-end street sign text recognition algorithm.The main research works of this paper are as follows:(1)In order to solve the influence of illumination on the detection of street sign text in natural scenes,a multi-channel MSER(Maximally Stable Extremal Regions)image pre-processing algorithm is proposed in this paper.Firstly,we know that the R,G,B and S color passes contain important image information in the image under strong light through ablation experiments,and then the MSER text regions are extracted from the images of these four color passes respectively,and then the extracted MSER text regions are integrated to get the pre-processed image,so as to reduce the interference brought by illumination and complex background on the text detection.The experimental results show that the image pre-processing method proposed in this paper leads to an improvement in text detection in terms of accuracy,recall and average score.(2)In order to solve the problem of not fully utilizing the relationship between high-level semantic information and contextual semantic information in the feature extraction network due to the complex environment in the process of street sign text detection,this paper proposes an enhanced feature pyramid network text detection method,which includes the Feature Pyramid Route Enhancement(FPRE)module and High Levels Feature Enhancement(HLFE)module,so that the low-level semantic information can be better transferred upward and the low-level and high-level semantic information of the network can be more fully utilized by enhancing the high-level semantic information.Experiments show that the proposed module in this paper can effectively improve the accuracy and recall rate of text detection.(3)In order to solve the situation that the text of street signs cannot be recognized accurately due to the complex strokes and similar character constructions of the text,this paper proposes a feature extraction network based on the attention mechanism,and introduces the channel attention mechanism(squeeze-and-excitation networks)SENet in the appropriate position of the backbone network,which makes the model In this paper,we propose a feature extraction network based on an attention mechanism.In addition,the Exponential Linear Units(ELU)activation function is introduced to reduce the effect of bias shift and accelerate the learning of the network for large multiclassification datasets. |