Font Size: a A A

Research On Tibetan Ujin Print Recognition Technology Based On Syllable Segmentation

Posted on:2021-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:R D Z CaiFull Text:PDF
GTID:2435330620975885Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The process of character recognition is a combination of pattern recognition,image processing and character processing.Tibetan printed recognition is an important part of Tibetan information processing,which can save the cost of Tibetan text input,editing and processing.It has important application value in the fields of Tibetan related press,office automation,collation of ancient books and digital library.Compared with other characters,Tibetan characters have syllable structure characteristics of close spacing between horizontal characters and different layers of vertical characters.Moreover,there are a large number of syllable classification,which brings a challenge to the study of Tibetan printed recognition.The traditional recognition of Tibetan printed is based on the segmentation unit of Tibetan characters,which uses the combination of rules and statistics.With the progress and development of information technology,neural network model has made great success in the field of image recognition.This kind of model can learn the structural features of longer segmentation units on large-scale data sets,and can effectively improve the performance of recognition system.Therefore,this paper takes the Tibetan black gold font as the research object,and carries out the research of Tibetan syllable as the recognition unit.The main work is as follows:1.In order to solve the problem of alignment between the training image and the label,and to build a Tibetan syllable corpus with high coverage,this paper deeply analyzes and studies the Tibetan character combination structure,puts forward the Tibetan text syllable segmentation method based on the mixed mode,and develops the Tibetan automatic syllable segmentation system,through which 626 kinds of Tibetan and Sanskrit characters and 19450 Tibetan syllable texts are collected Corpus.2.To solve the problem that Tibetan font is very close and text image segmentation is very difficult,this paper proposes an image text segmentation algorithm based on syllable and an image blank edge normalization algorithm.This segmentation method can correctly segment the existing phenomenon of the adhesion between the word and the word,and reduce the probability of the error of text image segmentation.The accuracy of segmentation is 99.31% on four kinds of Tibetan text images without interference,and 30500 and 132500 Tibetan character and syllable data sets are constructed respectively.3.This paper proposes a recognition model of Tibetan wujin printed based on syllable feature vector.The model based on syllable has stronger anti-interference and generalization ability,and the recognition effect is better than that of character.The parameters of the syllable based model are optimized,and the accuracy of the test is 16.39% higher than the original 80.83%.4.Through the integration of Tibetan preprocessing module,segmentation module and recognition module,the Tibetan wujin printed recognition system is developed.The system can rotate and cut the image part that users are interested in at will.This semi-automatic correction function improves the user's sense of experience and operation.The system divides one syllable every 0.283 seconds and recognizes one syllable every 0.018 seconds.The average recognition rate of the four undisturbed Tibetan text images is 96.53%.
Keywords/Search Tags:Tibetan printed recognition, Tibetan syllable segmentation, convol utional neural network, Syllable feature vector
PDF Full Text Request
Related items