Font Size: a A A

Printed Tibetan Character Recognition System's Research And Realization

Posted on:2008-08-26Degree:MasterType:Thesis
Country:ChinaCandidate:G LiFull Text:PDF
GTID:2178360242476306Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Tibetan, a sleek language, has a long history. Classical literature, ancient books and translated works in Tibetan are found in abundance. Tibetan has played a major role in preserving and developing Tibetan culture, science, the rich cultural heritage of China and promoting the cause of socialist construction. To flourish those traditions, we have the responsibility and obligation to use the most advanced means of information processing to make records and presentation of traditional Tibetan culture with digital information. Therefore, it is very meaningful to study the automatic identification technology of the Tibetan language.Based on the existing printed Tibetan recognition technology, this paper studies the extraction of the Tibetan character recognition and classification algorithms. With the introduction of the information theory, I put forward the feature extraction and the recognition algorithm, which are used as the core design of the printed Tibetan identification system. Major work completed is as follows:Firstly, based on the information theory, the feature extraction and recognition algorithm are put forward. According to three typical characteristics of the Tibetan characters, the samples are evaluated by use of the information entropy. With the combination of the Euclidean distance measurement and the mutual information, the classification and identification algorithm is designed; that is to say, by using mutual conditions of the high-dimensional information eigenvector and the metric dimensions'similarities, the disastrous effects of it have been erased. Meanwhile, it also retains the advantages of the traditional algorithm design; that is, being simple and inexpensive.Secondly, in the process of designing the printed Tibetan identification system, the multi-classification algorithm strategy is proposed based on the mutual information metric. The three-level classifier is designed and through laboratory tests, its accuracy rate is higher than that of the traditional classification. Meanwhile, in the pretreatment and the post-processing of modules, key technologies has been also analyzed, such as, preferences in the pretreatment of a return of the lattice size.Finally, by setting a considerable number of test data, the system's recognition rate has markedly improved, and has achieved satisfactory results.
Keywords/Search Tags:Printed Tibetan recognition, Information theory, Mutual information metrics, Feature extraction, Multi-level classification
PDF Full Text Request
Related items