Word Sense Disambiguation Method Based On WIC Dataset

Posted on:2022-10-16

Degree:Master

Type:Thesis

Country:China

Candidate:Z L Xie

Full Text:PDF

GTID:2507306527452414

Subject:Applied Statistics

Abstract/Summary:

PDF Full Text Request

Word sense disambiguation is one of the most important tasks in natural language processing.This task almost affects the capabilities of all applications including text translation,search engines,and so on.With the rapid iteration and development of deep learning technology,it has become possible for computers to process a large amount of text data.With the birth of text pre-training models in recent years,people can optimize and improve algorithms quickly and easily.The goal of research has also risen from word representation to sense representation,and word sense disambiguation algorithms based on different theoretical foundations and external knowledge are emerging in endlessly.In order to compare the current popular word sense disambiguation methods and models,this paper uses the MCL-WiC dataset derived from WiC as a sample to evaluate the ability of the model.And we build two kinds of models,the first one is word embedding based on context information It uses deep learning models like ELMo,BERT,that has undergone unsupervised training from a large number of corpora.And the output word vector is matched with different classification methods for word meaning comparison and classification;Another one is using labeled text library or online dictionary,etc.External resources have carried out a supervised meaning representation of the target word,and the result is obtained according to the optimal word meaning comparison.For the classification problem like WiC dataset,the experimental results show that the accuracy rate of using word vector to directly classify is significantly higher than that of sense representation.After the embedded sense is fine-tuned to the BERT model,the expected result cannot be achieved.Sense representation is affected by resource quality and sense definition.In the selection of the pre-training model of the word embedding based on contest information,ELMo is not much different from BERT_BASE,and it is significantly better than the more complex model BERT_LARGE.After comparing various classification algorithms,the cosine distance measure and Neumf can achieve better results in most cases,especially the ELMo+Neumf model achieves70.5%accuracy and 73.4 F1-score on the verification dataset.

Keywords/Search Tags:

Word Sense Disambiguation, Language Model, Sense Representation

PDF Full Text Request

Related items

1	The Exploration On The Methodology Of Language Sense For Middle School Students
2	A Study Of Cultivating Language Sense Of Junior High School Students
3	An Exploration On Cultivating High School Art Students’ Language Sense
4	Exploring A New Path Of Chinese Character Education In The Mainland Based On Taiwan's "word Sense Literacy Teaching Method"
5	Research On The Teaching Strategy Of Reading Aloud For Language Sense Training In Junior Middle School
6	Sense Of The Phrase In The Secondary Language Teaching Training
7	Solutions For Problems In Senior High School Language Sense Teaching
8	A Study On The Current Situation And Training Strategies Of Middle-grade Pupils' Sense Of Chinese Language
9	Research And Practice Of Language Sense Training Based On The Analysis Of Vocabulary Characteristics Of High-level Chinese
10	Sense Of Language Teaching In The New Curriculum Under Discussion