Font Size: a A A

Research On Word-level Ambiguity Resolution Method

Posted on:2023-03-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:M WangFull Text:PDF
GTID:1528307028470664Subject:Financial Information Engineering
Abstract/Summary:PDF Full Text Request
Language ambiguity is an important research area in the field of natural language processing.This phenomenon is widespread in various scenarios of natural language processing and thus the machine is faced with complex circumstances when processing natural language text.This has hindered the application of natural language processing techniques in daily life.This article mainly focuses on the research on word-level ambiguity resolution(word sense disambiguation)methods of English natural language corpus and aims at improving current word sense disambiguation models in two categories and from various aspects,so that the improved models can achieve state-of-the-art performance on standard English word sense disambiguation datasets.After a series of experiments,the word sense disambiguation methods proposed in this thesis have been proved effective in the scenarios of all-word word sense disambiguation,lexical-sample word sense disambiguation,cross-language disambiguation,few-shot disambiguation and zero-shot disambiguation,obtaining state-of-the-art performance.At the same time,this thesis also explores the application of word sense disambiguation on downstream tasks,proving the contribution of word sense disambiguation and word sense information injection to downstream tasks.The major research work and innovations are as follows:1)This paper makes the first attempt to propose a hybrid disambiguation method that combines two knowledge base-based word sense disambiguation methods.This method first relies on domain information retrieval and asymmetric word sense relationship exploitation to learn word sense representation,and then constructs a similarity-based disambiguation method,before initializing the importance of word sense nodes in the graph-based disambiguation method with the obtained similarity.Finally,the word sense importance is obtained by weighting the results from the above two methods.Experiments have shown that the proposed hybrid knowledgebased word sense disambiguation method outperforms previous state-of-the-art by2.8% F1.2)For the first time,in a framework compatible with both the word sense knowledge base and the word sense tagging corpus,a context expansion method based on disambiguation documents and a secondary disambiguation method based on word sense relations are proposed.This method takes the pre-trained language model as the basis of representation learning,uses the word sense knowledge from the knowledge base and annotated corpus to learn the word sense representation; uses the context information of the document under disambiguation to learn the ambiguous word’s representation.Further,a try-again disambiguation method based on word sense relationship is proposed for the first time.Experiments have shown that the proposed word sense disambiguation framework outperforms previous state-of-the-art in both knowledge-based and supervised category by 6.7% and 0.8% F1 respectively.In addition,the proposed method also obtains state-of-the-art in lexical-sample,few-shot and zero-shot word sense disambiguation,0.8%,9.7% and 15.2% F1 higher than previous best systems,respectively.3)The paper makes the first attempt to explicit model the word sense relations in context in a supervised word sense disambiguation framework.This method models the context from both word and word sense level in a supervised word sense disambiguation framework to learn the interrelatedness between words and word senses in the context respectively.This method gets rid of the dependence on the word sense relationship defined by experts,but learns the relationship between word senses from the annotated corpus.Experiments have shown the better performance of the proposed system on all-words,few-shot,zero-shot and multilingual word sense disambiguation datasets,obtaining 3.7%、5.4%、5.2%、1.2% performance boost of F1,respectively.4)For the first time,a partial word disambiguation method is proposed,and the word sense obtained after disambiguation is embedded into the text classification model for supervised training.This method uses the word sense category information to disambiguate a proportion of the words,and then integrates the word sense information as additional semantic information into the text classification model for supervised training,so as to enhance the semantic modeling ability of the model.
Keywords/Search Tags:Natural Language Processing, Word Sense Disambiguation, Knowledge Base, Pre-trained Language Model, Text Classification
PDF Full Text Request
Related items