Font Size: a A A

A Study On The Semantic Word Formation Of Two - Word Words For The Identification And Understanding Of Ordinary Unsigned Signals

Posted on:2016-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:Z W JiFull Text:PDF
GTID:2175330470984130Subject:Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
In recent years, with the rise of the new trend of lexicalism, lexical semantics has become one of the most forefront studies in Chinese information processing. As the sub-system of our language, lexicon has the characteristics of high distinctness and frequent self-variation, which to some extent further increases the difficulty of the study. Besides, compared with lexicon, morpheme is relatively limited in quantity but stable in semantic function as the basic components of word-formation. Thus, the approach of taking the morphemes as the basic resource to trace the word-formation discipline in semantics in order to help understand natural language has attracted more and more attention.The Chinese automatic word segmentation is the basis of Chinese information processing. Among the Chinese segmentation methods based on word table, the unknown word is the key factor that can affect the segmentation precision. Moreover, the common new words among the unknown words, which have the characteristics of diverse word-formations, variable functions and endless quantity, are really difficult for the existing Chinese word segmentation methods that based on statistics to identify. And the fact is that the existing Chinese word segmentation methods can only identify high-frequency words. Therefore, it seems that the identification of low-frequency words is somewhat helpless.In this paper, I will take the two-character words and three-character words from the Modern Chinese Dictionary (the Sixth Edition) as my investigative corpus by establishing the database of modern Chinese morphemes. Furthermore, on the basis of the morpheme database, I will select 50 high-frequency morphemes and then take the 8984 two-character words which have those morphemes as the enclosed inspection object of my study. To be specific, first of all, I will label the sense of the 8984 two-character words by referring to the Modern Chinese Dictionary (the Sixth Edition). Secondly, labeling the sense of the front and back morpheme based on the meaning of the word and HowNet. Thirdly, labeling the lexicalized meaning among morphemes from the perspectives of the structure of semantic combination, the distribution of semantic root, the mode of semantic combination and the type of semantic variation. Finally, I will combine the morpheme meaning with lexicalization meaning to set up a semantic descriptive system of two-character words which based on quantitative statistics.The semantic description system of two-character words mainly include the interpretation mode of the 8984 words, the database of morpheme and sense category, the distribution table of the morpheme meaning, the distribution table of lexicalization meaning and the distribution table of interpretation mode. Depending on my semantic description system of the two-character words, we identify and understand the 1413 new words from Modern Chinese Dictionary (the Sixth Edition) and the natural corpus from online forum. The result shows that the study on the semantic word-formation of two-character words has a certain practical value on the identification and understanding of common unknown words.
Keywords/Search Tags:two-character words, semantic word-formation, common unknown word, quantification
PDF Full Text Request
Related items