Font Size: a A A

Research On Localization And Recognition Technology Of Handwritten Meteorological Archives By Deep Learning Method

Posted on:2021-06-22Degree:MasterType:Thesis
Country:ChinaCandidate:K H JiaFull Text:PDF
GTID:2480306032480754Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Meteorological archives provide important data support for various fields such as meteorological monitoring and scientific research,and have high preservation and research value.Many existing historical meteorological files are mostly handwritten paper files.In order to better preserve and protect,it is necessary to use digital storage of scientific and technological means.The conventional method of file digitization is to manually compare the data and enter it,and finally save it as an electronic file.The manual sorting process is complicated,the input precision is low and it takes too long,often requires a lot of manpower and material resources,and the efficiency is not guaranteed.In recent years,the rapid development of artificial intelligence technology has brought many conveniences to industrial production,and replacing artificial intelligence with artificial intelligence is the trend of the times.The work of this paper is to use deep learning to locate the text area of the archive,automatically identify the content to be entered,reduce manual workload,and improve work efficiency.Meteorological file digitization is mainly divided into two tasks:text positioning and character recognition.Compared with traditional text localization methods,deep learning methods have higher accuracy in text localization tasks.Since the length of the text is not fixed and is different from the conventional target detection task,the positioning network selects the Connectionist Text Proposal Network(CTPN)which is sensitive to sequence information.The text targets are densely arranged in the scanned image and the targets are small.The positioning accuracy of CTPN model for the small targets is poor,which will also increase the difficulty of training the model and ultimately affect the subsequent character recognition effect.For the problem that it is difficult to locate the tiny target,this article adopts a strategy from local to overall,automatically divides the scanned image into regions according to the characteristics of the archive data,and locates in the sub-region.Do overall training and local training on the scanned images separately.With the same training round,the local training model converges faster and the positioning is accurate.The overall training model has a large number of mistaken mentions and missed mentions,which cannot be practically applied.There are also many difficulties in the character recognition task.For example,the handwriting of different recorders is different,and the same characters will also be greatly different due to the different handwriting of the recorders.There will be adhesion between handwritten characters,and it is impossible to separate each character separately character.In order to improve the generalization ability of the model,a large amount of training data is prepared in this paper to ensure the diversity of the data and ensure that texts with different handwriting can be recognized.For the problem that the traditional method cannot recognize the sticky characters,this paper chooses a Convolutional Recurrent Neural Network(CRNN)that combines a Convolutional Neural Network(CNN)and a Long Short-Term Memory(LSTM)to recognize continuous text.There are still altered characters in the handwriting file.For characters that have been altered and crossed out,the neural network cannot recognize them correctly,which brings great difficulties to the recognition task.In this regard,based on the difference between no-alteration data and alteration data,this paper constructs two data sets with different features,trains the model separately,and learns the features of the altered characters.On the premise of ensuring that the unaltered text is correctly recognized,the altered text is screened out and used for manual screening at a later stage.Finally,the test is performed in the test set,and the pictures with different recognition results of the two models are eliminated for manual screening in the later period,and the remaining pictures are compared with the labels to determine the accuracy.The recognition accuracy is more than 99.7%,and compared with the recognition results of Tencent's image recognition algorithm:The algorithm in this paper has obvious advantages in the recognition of sticky characters,and can accurately identify altered characters.The work in this article can significantly reduce the workload and improve work efficiency.
Keywords/Search Tags:Text positioning, Character recognition, Deep learning, Convolutional neural network(CNN), Recurrent neural network(RNN)
PDF Full Text Request
Related items