| Text is the main way of information exchange and cultural inheritance,which plays an extremely important role in human society.Images in natural scenes usually contain a wealth of text information.Accurate and efficient extraction of these text information is helpful for our understanding of images and scenes.In recent years,with the development of deep learning,text detection and recognition technology in natural scenes has attracted more and more attention from researchers.The traditional natural scene text extraction technology is usually divided into two independent parts: the text detection branch and the text recognition branch.These two branches are usually implemented and executed separately.Although this approach seems simple,it also brings a lot of loss of prior knowledge.Text detection and text recognition are two complementary tasks.We can make these two tasks share convolution computation and use the complementary supervision information between them to learn more common image features and improve the speed and accuracy of the whole text recognition network.The main content of this paper is the end-to-end text detection and recognition network.The network proposed in this paper is improved based on the FOTS that is a current end-to-end text detection and recognition network with good performance.The detection branch of FOST has shortcomings such as weak feature extraction ability,insufficient receptive field and unbalanced sample weight,which lead to its weak ability in detecting long texts and low model accuracy.To solve the above problems,this paper improves the network structure and loss function.In terms of network structure,Res Net-50 is used to replace VGG-16 to enhance the feature extraction capability of the network,and ASPP structure is added to increase the receptive field of the network,therefore,it will have a better detection effect on long texts.In terms of the loss function,Dice loss is used as the classification loss function of the text detection branch,and the weights of text areas with different scales are balanced,which effectively improves the accuracy and effect of the whole End-to-End text detection and recognition network.Finally,we experiment the proposed network on both the public benchmarks and the minority language datasets in the actual engineering project.The experimental results show that the performance of the end-to-end network presented in this paper has reached the advanced level and has strong research and practical value. |