Font Size: a A A

Research On Word Sense Disambiguation And Syntactic Parsing Based On Computing Of Semantic Template

Posted on:2023-05-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:W WangFull Text:PDF
GTID:1528307031978169Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In the field of natural language processing,both word sense disambiguation(WSD)and syntactic parsing are important fundamental parts,which are widely used in information retrieval,machine translation,text comprehension and intelligent conversation.Most of the existing researches on WSD and syntactic parsing are conducted directly on word form information.Although this method has high accuracy,it has insufficient generalization ability,which easily leads to data sparsity and affects the improvement of system performance.In order to solve the problems caused by inadequate generalization capacity of word form directly matching,in this thesis,using the character that one semantic classification code(semantic code for short)in semantic dictionary can represent multiple words with similar meanings words,words are converted first into semantic codes,and then semantic codes are used to perform matching and computing.Therefore,this thesis focuses on the constructing semantic template by the semantic code and the computing of semantic template,which includes the following four aspects:(1)Two theoretical models for the computing of semantic template are proposed: Sliding Match of Semantic String(SMOSS)model and Stretchable Matching of Semantic Template(SMOST)model.Both models use the semantic codes of words to construct semantic template,the problem of sparse data of word template and inadequate disambiguation of POS template are alleviated.The SMOSS model finishes the whole sentence matching by successively sliding and matching the semantic strings of the sentence with N-gram fixed length semantic templates,the strategy of "integrating voting results of multiple adjacent matching templates as output" is adopted to avoid the low reliability caused by arbitrarily selecting a matching result.The SMOST model uses variable length template,by stretchably adjusting the matching position of semantic template units to bypass the units that failed to match,thus the problem that usually template matching can only be matched at the corresponding fixed position is solved.And the strategy of "sorting matched nodes before constructing node chain" is adopted to solve the combinatory explosion problem when constructing node chain with more matched nodes.(2)The method of WSD based on SMOSS model is proposed,Chinese and English WSD is implemented respectively.It is close to the performance of SOTA on the standard traing and test set of Chinese sample word WSD in Sem Eval-2007 and has also achieved good results on all words WSD test for the sentences labeled by manually labeling semantic code in TCT Chinese treebank.The validity of the Chinese WSD based on SMOSS model is proved.But it is below the expectation on the test set of English all words WSD in Sem Eval-2015.(3)The method of WSD based on SMOST model is proposed,first English WSD and then Chinese WSD is implemented respectively.The semantic node chain is constructed by using SMOST model on the left and right sides of the target ambiguous word,then according to the total score of the left and right semantic node chain,the result corresponding to the highest score is selected as the output.In the five test sets of Sense Eval-2,Sense Eval-3,Sem Eval-2007,Sem Eval-2013 and Sem Eval-2015 of English WSD task,it is close to the performance of the best system using supervised learning method under same test conditions.The validity of English WSD based on SMOST model is verified.Then,the SMOST model is applied to Chinese WSD.On Sem Eval-2007 test set,the results obtained by using SMOST model are much better than those obtained by using SMOSS model.It shows that the SMOST model is superior to SMOSS model in both English and Chinese WSD task.(4)The method of Chinese syntactic parsing based on SMOSS model is proposed.Firstly,the semantic codes corresponding to each word in a sentence are obtained through Chinese WSD based on SMOSS model,and then these semantic codes are chunked by using SMOSS model layer by layer for Chinese syntactic parsing.Experiments are carried out on TCT Chinese tree bank corpus of Tsinghua University,and Berkeley Parser is selected as a comparison.In the closed test,F1 score reached 99%;in the open test,F1 score reached 70%.The experimental results verify the feasibility of using the semantic codes of words directly instead of words and POS for Chinese syntactic parsing.
Keywords/Search Tags:N-gram Semantic Template, Sliding Matching of Semantic String(SMOSS), Stretchable Matching of Semantic Template(SMOST), Word Sense Disambiguation, Syntactic Parsing
PDF Full Text Request
Related items