Font Size: a A A

Research On Knowledge Base Construction For Tibetan Function Words

Posted on:2013-08-30Degree:MasterType:Thesis
Country:ChinaCandidate:R S Z CaiFull Text:PDF
GTID:2235330395970846Subject:Chinese Ethnic Language and Literature
Abstract/Summary:PDF Full Text Request
With the rapid development of the widespread use of computers and the Internet, human beings have entered into a information society. The use of computer for information processing has become the main theme of this era. The research on Tibetan information processing is from "word" treatment to "language" processing, which has made great achievements in the theoretical exploration, the construction of the basic knowledge base and corpus, but compared to the actual needs of the rapid expansion, Tibetan information processing technology stilt lags behind the actual demand, and the main reason is that our language knowledge is not enough in computer reserves. Language knowledge base is an important part of natural language processing system, whose size and quality is the key of success or failure for natural language processing system. For Tibetan, especialty emphasis on the construction of language knowledge base. Now the knowledge base in the field of Tibetan information processing is mainly syntax information dictionary of notional word and various of corpus, in addition, which has not established the systemic knowledge base of Tibetan function word. Therefore, the construction of Tibetan function words knowledge base is the weak link in the field of Tibetan information processing. In order to make Tibetan-language information processing reach a new height, the construction of Tibetan function words knowledge base is an unavoidable basic project.The thesis is divided into eight chapters.The first chapter "Introduction" mainly introduces the background and significance, research status and research purposes of the Tibetan function words knowledge Base Construction. Also it introduces the development and achievements of the Tibetan language information processing. In the information age, we must break through the traditional language research methods, from another new formalization frame mode, and take research objectives, research purpose, research methods of Tibetan grammar as entry point, and provide effective language resources for the Tibetan-oriented information processing.The second chapter "the construction of Tibetan function word knowledge base" introduces the general view and the role of the Tibetan function words, the importance, construction methods and contents of the Tibetan function words knowledge base. The knowledge of function words plays an importance in lexical analysis, parsing and machine translation of the Tibetan information processing. Construction methods borrowed from Liu yun’s idea of the "Trinity", that is to build a machine dictionary about Tibetan function words, to build Tibetan corpus and a rule base of Tibetan function words. According to the information processing requirements, the Tibetan function words are divided into three parts, such as celt particle, free and not free function words.Chapter Ⅲ, Chapters Ⅳ and Ⅴ is the focus of this article. This chapter comprehensively describes contents and methods of Tibetan grid particle, the "free function words and non-free function words" knowledge base construction. Describe19Tibetan grid auxiliary,20free function words and47free function words, and a total of86Tibetan function words, and establish machine dictionary after grammatical category and field set. According to400,000,000words corpus established, to count the relative data of frequency and frequency time. At the same time, it makes the rules for each Tibetan function word.Chapter VI "Experiment and result" regards the tag set on the basis of grid particle as the object, and label1,000,000words corpus by manual to do experiments. The results show that the effect is remarkable, and achieve the intended purpose. The accuracy rate of label is100%.Chapter VII "The difficulty to build the knowledge base of Tibetan function words"mainly explains the problem of Tibetan function word classification, classes and semi syntax of the Tibetan function words and description of Tibetan function word machine-oriented and other aspects.The eighth chapter is the concluding of this paper, which is summary of existing research work and further research plans.
Keywords/Search Tags:Tibetan function words, Tibetan information processing, knowledge base, machine dictionary, rule base
PDF Full Text Request
Related items