Font Size: a A A

Research On Manchu Text Detection And Recognition Method Based On Deep Learning

Posted on:2023-11-01Degree:MasterType:Thesis
Country:ChinaCandidate:D L SunFull Text:PDF
GTID:2558306848454544Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Minority languages carry rich and colorful Chinese culture and play an irreplaceable role in the development of national culture.However,in recent times,minority languages are used less and less frequently in real life,which leads to many languages being endangered or near endangered,and the culture behind them is difficult to inherit.Manchu is the national language of the Qing Dynasty.A large number of ancient books left over from the Qing Dynasty are precious historical and cultural heritage.The block printed Manchu characters preserved in many ancient books of the Manchu and Qing Dynasties urgently need to be detected and recognized by efficient and accurate OCR technology,so as to help historians eliminate language barriers and promote the development of historical research of the Qing Dynasty.This paper focuses on Manchu data processing,detection and recognition,and deeply studies the related algorithms.This study mainly includes the following three aspects:(1)Based on the unique character and structure characteristics of Manchu,this paper automatically generates a data set suitable for the task of woodblock printed Manchu recognition,and preprocesses the detection and recognition data.Firstly,according to the complex writing rules and explicit state of Manchu,taking the minimum explicit form as the label unit,this paper proposes a Manchu label mapping scheme which can effectively solve the problem of word diversity;Secondly,in order to save a lot of time and labor costs required for annotation,this paper designs a complete process of automatically generating text lines,and designs data enhancement methods according to the characteristics of Manchu characters and data set scenes,so as to manufacture and expand the Manchu recognition sample set.(2)In this paper,we study and propose a Manchu text detection method based on improved DBNet,and realize word-level Manchu text detection.Firstly,according to the characteristics that the aspect ratio of Manchu words changes greatly and most of them are narrow and long,an algorithm of shrinkage and expansion according to the aspect ratio of Manchu words is designed to improve the accuracy of expansion domain in DBNet;Secondly,the influence of loss function on the detection results is analyzed,and Gaussian kernel and area influence factor are introduced into the loss function;Finally,experiments are carried out on the woodblock printed Manchu image data set to verify the superiority of this method.(3)This paper studies and proposes a Manchu character recognition algorithm based on improved CTC,and realizes character level Manchu character recognition.Firstly,this paper describes the general process of traditional OCR method,and analyzes its improvement methods and existing problems;Secondly,according to the particularity of Manchu character recognition task,a character recognition model based on CTC attention mixed loss function is designed;Finally,the training is carried out on the generated data set,and the woodblock printed Manchu text line data set is used to verify the effectiveness and robustness of this method.
Keywords/Search Tags:Manchu ancient books, deep learning, character recognition, text detection, image recognition
PDF Full Text Request
Related items