Font Size: a A A

The Automatic Segmentation Technology Based On "Recessive Vocabulary And Specialized Thesaurus"

Posted on:2008-09-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2155360215487397Subject:Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
Chinese information processing technology is an important computer application technology in China. The State Council explicitly points out in its national medium/long-term scientific and technological development program, "Chinese information processing technology is the focus of high-tech development." Moreover automatic segmentation of written Chinese is the acknowledged difficulty in Chinese information processing. Anything that involves syntax or semantic research projects (such as machine translation, natural language understanding, etc.) should use this as a basic unit. The automatic segmentation of Chinese characters is the most fundamental aspect of all areas in Chinese information processing; meanwhile it is the "bottlenecks" in Chinese information processing.Existing segmentation method has always been limited to the segmentation and understanding of traditional text. However, as the Chinese characters have the limitations of linking in the text; this brings great difficulties to the automatic segmentation of Chinese text. Chinese scholars have developed a number of automatic segmentation systems since the 1980s. Still there is not a small distance from practical applications. One automatic segmentation, LuoHaiqing's "list of Recessive vocabulary" is a relatively good system among them. It is written in assembler language, and it has the advantages of small occupied space, fast running speed, as well as low dependency on system. We try to make a series of improvements based on this system. We also need to further improve the accuracy of segmentation while maintaining the advantages of its segmentation speed.This paper is divided into five parts. The first part is the literature summarization, which mainly introduces the importance of automatic segmentation in Chinese written language. It aims at the research from the early 1980s till now. The second part is a detailed introduction to Luo's so called "list of Recessive Vocabulary" automatic segmentation technology, and we also analyze its strengths and weaknesses comparing to other segmentation software in the same period. The third part we propose a segmentation model named "Recessive Vocabulary+Specialized Thesaurus," and we select an area in Specialized Thesaurus as the specific demonstration. The fourth part is the summary of this paper. The fifth is the table of Specialized Thesaurus.
Keywords/Search Tags:automatic segmentation, Recessive Vocabulary, Specialized Thesaurus, model
PDF Full Text Request
Related items