Font Size: a A A

Research And Implementation Of Tibetan Print Recognition System

Posted on:2020-06-19Degree:MasterType:Thesis
Country:ChinaCandidate:B J GongFull Text:PDF
GTID:2415330575993846Subject:Tibetan information processing project
Abstract/Summary:PDF Full Text Request
Print character recognition is a subject of computer pattern recognition and image processing.The main purpose of this method is to analyze and process the printed document images input to the computer,extract the text information in the image,and then convert it into the text that can be edited and processed by the computer.The research of English and Chinese character recognition is earlier,and through the unremitting efforts of many scholars,the Chinese and English character recognition has accumulated rich achievements in technology,and many commercial products have appeared,the recognition rate has reached more than 99%.Due to the late informatization construction of Tibetan language,Tibetan language recognition is still in its initial stage with the rapid development of informatization.Moreover,Tibetan is quite different from Chinese and English,so it is not possible to fully use the mature technology in Chinese and English recognition for reference.Therefore,it is necessary to research and design a recognition system suitable for Tibetan according to its own characteristics.The Tibetan language is a very ancient language that records the unique Tibetan culture and is an important part of the world's cultural treasure house.With the advent of the information age,in order for Tibetan language to be more effectively disseminated,consulted and exchanged,it is necessary to organize and preserve it in a digital way.Therefore,Tibetan language recognition technology is an ideal and effective means to solve this problem,which can reduce the input of a large number of human and financial resources.In addition,the development of Tibetan language recognition technology can promote the development of national culture,education and economy.Of great importance.According to the above problems,this paper deeply studies the key technologies in the character recognition system and the structural characteristics of Tibetan characters.This system mainly consists of four modules,namely,print Tibetan document image preprocessing,Tibetan document image segmentation,Tibetan character image feature extraction and classification recognition.This system is developed under the Windows platform by using python+opencv2 programming language.This research focuses on the segmentation and classification of Tibetan characters.The detailed segmentation method based on multi-strategy and the recognition method based on secondary classifier are proposed.The main work of this paper is as follows:1.Image preprocessing.In this paper,the image preprocessing process is divided into four steps: image grayscale,binarization,denoising and tilt correction.Its main purpose is to remove as much useless information as possible from the printed Tibetan document image and make the text information in the image easier to detect.The image preprocessing in this system mainly adopts the preprocessing method commonly used in the text recognition system,and has achieved good results.2.Segmentation of Tibetan characters in printed Tibetan document images.The Tibetan character segmentation process is divided into line segmentation and character segmentation.First,each line of text in the image is segmented,and then the Tibetan characters in each line are segmented.Due to the different width and height of Tibetan characters,there will be different degrees of overlap and adhesion in the images.To solve this problem,this paper proposes a multi-strategy-based thinning segmentation method,which has a good segmentation effect on overlapped segments.3.Feature extraction.Feature extraction is a very important step for the print Tibetan character recognition system.Its main purpose is to extract characteristic sequences describing the essence of Tibetan characters.The extracted feature sequence enables the computer to recognize text.According to the glyph and structural features of Tibetan characters,this paper proposes a method for extracting mixed features of Tibetan characters,which mainly include baseline features,closed area number features and coarse grid features.4.Classification recognition.After extracting features,the next step is to classify and match them with the feature library,and finally achieve the recognition effect.The design of classifier is a key problem in the recognition process.The quality of classifier directly affects the recognition accuracy.Therefore,this paper designs a secondary classifier based on the mixed features of Tibetan.This classifier can make up the defect between small dimension feature and large dimension feature,and improve the speed and accuracy in the recognition process.Finally,through experimental tests,the printed Tibetan text recognition system developed in this paper has a good recognition effect,with the recognition rate reaching 83.24%.However,at the same time,some module functions need to be further improved.
Keywords/Search Tags:Tibetan print recognition, Tibetan character, Characteristic signs, The secondary classification
PDF Full Text Request
Related items