Font Size: a A A

Modern Chinese Login Derivative Analysis And Recognition At The End Of The Study

Posted on:2013-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:Q WangFull Text:PDF
GTID:2245330395453248Subject:Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
In the field of Chinese information process, the basic task of Chinese language analysis is automatic segmentation of Chinese word. There are two problems in automatic segmentation at current state:the recognition of unknown words and the segmentation of ambiguous phrase. The recognition of unknown words is one of important elements in correctly segmenting Chinese word and yet a difficult task to accomplish.In recent years, in the field of recognition of unknown words, many Chinese scholars have focused on named entity recognition, and many achievements have been accomplished. However, there is paucity of researches on recognition of suffix. Most of the research on suffix recognition has based on individual case, and exhaustive research in a certain range was rarely done. Furthermore, there is no research on word-formation of suffix, which neglects the characteristics of interior structure of the word.Based on the theories of suffix segmentation, this dissertation uses an important tool in Computational Linguistics research--set a standard for suffix segmentation and make a suffix table for the purpose of information process in the light of Quantitative Method of database research.This dissertation analyses every suffix in suffix table on the basis of different categorizations, observes different grammatical and semantic characteristics in Derivatives (Word-Formation) Model, and researches on word-formation models of known words in database adopting quantitative approach by using suffix segmentation standard.In Derivatives (Word-Formation) Model research, this dissertation categorizes suffix according to different meanings and focuses on the characteristics and word-formation model of unknown Derivatives (Word-Formation) words with "们" and"者”in the light of distribution of unknown words in database.In the research on recognition of unknown words, this dissertation conducts two sets of parallel experiment according to the different word formation capabilities of derivatives, and designs feature templates accordingly. This dissertation also conducts recognition experiment based on Conditional Random Fields and tries to testify the feasibility of the experiment by the outcome. Finally, this dissertation makes an overall conclusion, summarizes the main work for this dissertation and tries to set a plan for further research.
Keywords/Search Tags:Unknown Words, Suffix, Derivatives (Word-Formation) Model, CRF Model
PDF Full Text Request
Related items