Font Size: a A A

The Construction Of Chinese Morpheme Words Knowledge Base And Its Application In Understanding Unregistered Words

Posted on:2018-07-20Degree:MasterType:Thesis
Country:ChinaCandidate:J J QuFull Text:PDF
GTID:2355330518491081Subject:Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
Chinese vocabulary system is constantly developing and changing,so the number of unknown words is infinite. However,morpheme is relatively limited in quantity and stable in semantic function as the basic components of word formation.Therefore, in natural language processing, morpheme can be used as basic resources to obtain word formation knowledge, used to recognize and understand unknown words. However,most word formation knowledge bases are only used for statistic of word formation rules. There is a disjoint phenomenon between word formation knowledge bases and application research of unknown words.In order to get more conductive word formation knowledge for machine computing, we take the two-character words common to Modern Chinese Dictionary(5th edition), HowNet(2009 edition) and Cilin(extended edition) to construct the Chinese Morpheme-word Knowledge Base. Each word sense forms a record, a total of 39102 records, and each record with 19 properties, where particle degree of the smallest is two kind of morpheme meaning based on HowNet and Modern Chinese Dictionary. In this paper, we use Chinese Morpheme-word Knowledge Base, mainly to do the following aspects of research:First, we statistically analyze word formation knowledge of nouns, verbs and adjectives in Chinese Morpheme-word Knowledge Base from five aspects, which are semantic category, combination of morpheme’s parts of speech, combination of morpheme’s semantic category, grammatical structure type, the type of relationship between word meaning and morpheme meaning, then statistically analyze combination of morpheme’s parts of speech,combination of morpheme’s semantic category, the type of relationship between word meaning and morpheme meaning by grammatical structure type in each parts of speech.Second,based on word formation knowledge in Chinese Morpheme-word Knowledge Base, we use the phased algorithm to automatically predict word formation knowledge of unknown words. Through the combination of morpheme meaning or the combination of morpheme’s semantic category, we first predict the knowledge of semantic level, then determine the corresponding morpheme, and finally get the knowledge of word formation of unknown words. The algorithm is simple, intuitive and reasonable. The experimental criteria is seven predictive content are all correct, which are the first morpheme’s parts of speech,the first morpheme’s semantic category, the first morpheme’s meaning, the last morpheme’s parts of speech,the last morpheme’s semantic category,the last morpheme’s meaning,grammatical structure type. The experimental results show that the prediction accuracy is 62.32%and the recall rate is 61.71%.Third, based on the prediction of word formation of unknown words, we use word similarity based on word formation to find word with the greatest similarity to the unknown word in Chinese Morpheme-word Knowledge Base, in order to achieve semantic understanding of unknown words. According to the experimental evaluation criteria, when the threshold is 0.8, the efficiency of semantic comprehension of unknown words is 51.69%.To sum up, the Chinese Morpheme-word Knowledge Base we construct achieve good results in the application research of unknown words and has important value for natural language processing.
Keywords/Search Tags:Language knowledge base, HowNet, Unknown words, Sense guessing, Word similarity
PDF Full Text Request
Related items