Font Size: a A A

The Lexical Structure And Semantic Structure Analysis Of Forty Thousand Words

Posted on:2013-06-12Degree:MasterType:Thesis
Country:ChinaCandidate:H F QiuFull Text:PDF
GTID:2245330395452640Subject:Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
The identification of unknown word is a difficult aspect of Chinese information processing. In large scales of Chinese text automatic segmentation, the unidentified unknown word is an important point that causes errors. From the aspect of research object,unknown word research mostly focus on the identification and POS guessing, and even if identified, we still know nothing about it’s meaning,it’s internal structure and it’s similar words. So, further research is necessary.Natural language understanding systems depend on knowledge of large-scale lexical base. However, most of the existing machine dictionaries are grammar dictionaries, and they can hardly meet the need of Chinese analysis. The method of sysntactic and semantic combination is more suitable. Thus, a grammar and semantic dictionary is needed.This paper made an analysis and survey on about forty thousands of double syllables and polysyllables, except idioms,proper noun,transliterates and vague-structured words, from Lexicon of Common Word in Contemporary Chinese (E-edition) compiled by the Ministry of Education of China. To explore the influence of internal structure of Chinese compound word on the syntactic character of compounds, we studied on these words’ construction modes, lexical categories and cores. And then, we used the model of Conditional Random Field and the VC++to develop a tool that can analysis the structure and semantic of compound words automatically.
Keywords/Search Tags:Compound words, Lexicon, Semantic, Unknown word, Word frequency
PDF Full Text Request
Related items