Font Size: a A A

A Study Of The Information Processing In Tibetan Provebs Corpus Building

Posted on:2017-10-14Degree:MasterType:Thesis
Country:ChinaCandidate:N C SuoFull Text:PDF
GTID:2335330491456707Subject:Chinese Ethnic Language and Literature
Abstract/Summary:PDF Full Text Request
This paper firstly used the method for collecting and inputting. Based on Amdo, Kham and Weizang dialects and Tibetan proverbs " gesar " then built a corpus of Tibetan Proverbs. Also did the automatic word segmentation and manual proof modified the principle of lexical segmentation in order to build a Tibetan proverb corpus and Thesaurus. In the content, the Tibetan proverb is subdivided into twelve types according to the relevant literature. In the process of collecting and sorting, the proverb in the form of the new increase is divided into thirty-two kinds. "Tibetan Proverbs" are researched from the entry number distribution, frequency of vocabulary and the lexical frequency. Finally, sorted and retrieved according to the three Tibetan regional dialects "Tibetan and Chinese control", the alphabetical order and content classification. Its function is mainly applied in two aspects. Firstly, Tibetan proverb corpus is constructed just serves for Tibetan information processing. Secondly, it can be a tool book of Tibetan study and also can be a basic resource for the study of the Tibetan words of proverbs for Tibetan language learners and researchers. The purpose of this paper is do basic work on the future Tibetan information processing in the field of syntactic categories for tagging, automatic word segmentation, syntactic research, phrase research, machine translation, and electronic dictionary compilation. It provides a new research method and means for the study of Tibetan Literature in the future. Following are its innovation:Firstly, it collected a lot of scattered Tibetan proverbs; it is the most so far. Secondly, Classification and labeling of computer information processing have been done. Thirdly, a bilingual corpus of Tibetan Proverbs was built. Fourthly, Tibetan proverbs constructed the retrieval process, it provided a convenient condition for the future study and research on bilingual teaching facilities. The next step is to translate all the Tibetan proverbs I have collected. In the mixed sort, the content, form, paragraph and syllable pause of the mark in the click of the relevant entry can be in the entry is the further study and research task.This paper argues that constructing high quality of Tibetan proverbs library can not only better master and use Tibetan proverbs and provides an indispensable to study Tibetan language and literature field, but also expand Tibetan natural language processing related to text database.
Keywords/Search Tags:Tibetan proverbs, Tibetan proverbs corpus, Tagging, Retrieve
PDF Full Text Request
Related items