Font Size: a A A

Research On The Frame Selection Of Unknown Lexical Units Based On Word Similarity

Posted on:2019-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:J WangFull Text:PDF
GTID:2428330551460311Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As a kind of lexical semantic resource,Chinese FrameNet can be widely used in the research field of Chinese information processing,such as question answering system for reading comprehension.However,like other semantic resources,it needs to face the problem of lexical unit coverage rate.When the semantic analysis of Chinese texts is based on framework semantics,the problem of lexical unit coverage rate will lead to the target words that can arouse the semantic scene in CFN,but are not included in the known frame,so unknown lexical units hinder the normal process of semantic analysis task.To improve the lexical unit coverage in Chinese FrameNet,it is necessary to enrich the existing lexical unit library by enriching frame lexical unit.Based on the topic of " the key technology of language problem solving and answer generation " in the National 863 project,this paper aims at the problem of unknown lexical units in the process of semantic analysis of reading comprehension in the Chinese college entrance examination,and takes the semantic similarity of the unknown lexical unit and the frame lexical unit as the research angle.When the unknown lexical unit is divided into the closest frame to the semantic scene,the frame selection task of unknown lexical units is completed.The main work and research results of this paper are as follows:Firstly,aiming at the frame selection task of unknown lexical units,two methods are proposed and verified.(1)The selection method of unknown lexical units based on HowNet semantic dictionary.According to the knowledge description language and the semantic level architecture of HowNet,the similarity degree is calculated by the transformation process of the unknown lexical unit and the frame lexical unit in the transformation process of " word similarity-conceptual similarity-semantic similarity ".According to the similar values of the unknown lexical unit and frames from high to low,the frame selection range of unknown lexical unit is confirmed.Finally,the accuracy rate of 70.38% is obtained.(2)The selection method of unknown lexical units based on Word2 Vec word vector model.The Word2 Vec tool is used to train the word vector table for large-scale corpus training,and the similarity degree is calculated by Euclidean distance and cosine similarity after the unknown lexical unit and the frame lexical unit are quantified.According to the similar values of the unknown lexical unit and frames from high to low,the frame selection range of unknown lexical unit is confirmed,and the highest rate of accuracy is up to 81.45%.Finally,based on the above two kinds of unknown lexical units frame selection algorithm,this paper designs and implements the prototype system of Chinese FrameNet unknown lexical units frame selection,which provides an automatic tool for solving the problem of unknown lexical units,and can be applied to the expansion of the scale of Chinese FrameNet.
Keywords/Search Tags:Chinese FrameNet, Unknown Lexical Units, Word Similarity, HowNet, Word2Vec
PDF Full Text Request
Related items