Font Size: a A A

The Research On Tibetan Automatic Word Segmentation Technology

Posted on:2011-12-25Degree:MasterType:Thesis
Country:ChinaCandidate:B D Z PuFull Text:PDF
GTID:2155360308459598Subject:Chinese Ethnic Language and Literature
Abstract/Summary:PDF Full Text Request
Tibetan word segmentation is an indispensable fundamental work for Tibetan information processing. From text input system(such as intelligent statement input, sound input and handwritten input) to the word processing (such as text check), and speech synthesis, text retrieval, text classification, natural language interface, automatic abstract etc. The segmentation system applicate every where. It is the core of Tibetan information processing and foundation of Tibetan natural language understanding.As well known, English words separated by Spaces, but in Tibetan sentence between the word no obvious separators (such as Space). Tibetan words unit is syllables, the syllables together to sentence and describe the mean. But Tibetan sentences composed by syllables, so computer can underdtand the Tibetan by the word segmentation. The Tibetan syllables word sequence segment into meaningful words, namely the Tibetan word segmentation.It is description of the whole technology for word segmentation, and introduction of the technology and the theory of Chinese word segmentation methods and Tibetan word segmentation in this paper. It is description of the basic concept and the research situation of Tibetan word segmentation, and introduction of the unit and Tibetan word segmentation method.It is bring out the Tibetan word segmentation with Tibetan participle, as segment the sentence by Tibetan natural markers, and block byauxiliary suffered, Tibetan word segmentation by block matching and statistics in this paper. It is bring out the method of Tibetan word segmentation and the key techniques of processing methods. The key techniques include the lattice auxiliary identification method, and identification method of the ambiguity overlap type and combination, identification of Not login. The identification the proper noun phrase and new recognition method is the proposed rules combine whit statistics...
Keywords/Search Tags:Tibetan word segmentation, auxiliary, Not login, statistics, matching, words segmentation methods
PDF Full Text Request
Related items