Font Size: a A A

Design And Implementation Of Printed Tibetan Recognition Software On Android Platform

Posted on:2021-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y ChenFull Text:PDF
GTID:2415330623473106Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid popularization of smart phones and other mobile terminals,the acquisition and transmission of text images in natural scenes has become more and more convenient,With this,people's demand for fast image processing and information acquisition becomes more and more urgent.How to use mobile devices to quickly and accurately extract text information in images and recognize it as coded text is becoming a hot research topic.At present,the recognition software for printed Chinese characters and Western characters in the mobile market has become practical.However,there is no special recognition software for Printed Tibetan characters.Aiming at this gap in the application market,Based on the characteristics of Tibetan document image,this dissertation firstly studies the line word segmentation algorithm,then constructs the character sample library,and designs a recognition model suitable for printed Tibetan characters.On this basis,it implements a print Tibetan recognition software on android platform.Based on the existing recognition technology of Tibetan characters in print,this dissertation makes a detailed study on the line segmentation and classification recognition in the process of Tibetan recognition,proposes a line segmentation method combining baseline information and center of gravity of connected domain,constructs a dataset of Tibetan characters in print,designed and trained a 584 Tibetan character convolutional neural network model CovNet to recognize characters.The main contents of this dissertation are as follows:(1)A Tibetan character segmentation method based on the baseline position information and the center of gravity of the connected domain is proposed.This method can solve the problems of character sticking,breaking,and overlaping in Tibetan document images,and improve the accuracy of line segmentation rate.(2)A printed Tibetan character sample dataset(TCDS)is constructed to train convolutional neural network.The dataset is constructed by artificial sampling and synthetic sampling.Among the synthesized data,646 sets of sample data were synthesized for 584 characters of commonly used Printed Tibetan with special effects such as multiple fonts,text distortion,background noise,stroke adhesion,stroke fracture,text tilt,etc.TCDS has 736 sets of data,each set of 584 characters.(3)Design and train a convolutional neural network model CovNet to recognize characters.The recognition rate of the model on the TCDS dataset is 99.89%.(4)Designed and implemented the print Tibetan recognition software on the Android platform.The software supports online and local identification,completes all operations with one click,and hides intermediate processes from users.The software has a recognition rate of 99.15% in actual samples.At the same time,the software also supports the Chinese retrieval function of commonly used Tibetan phrases,which can translate the recognition results of Tibetan image documents into Chinese.To sum up,584 characters and 736 sets of 429824 samples of Printed Tibetan dataset were constructed by manual collection and synthesis of character samples,and a printed Tibetan convolution neural network recognition model was designed and trained on the basis of the dataset.In this dissertation,a new line segmentation method for Printed Tibetan documents is proposed,and a printed Tibetan recognition software on Android platform is designed and implemented.The recognition rate of the software is 99.15% in the actual samples.
Keywords/Search Tags:Tibetan recognition, Baseline, Center of gravity of connected domain, Convolutional neural network, TCDS dataset
PDF Full Text Request
Related items