| In computer linguistics,word sense disambiguation is an important issue in natural language processing.Word sense disambiguation refers to the process of determining object semantics according to the context.The phenomenon that words have different meanings in the context of semantic environment will appear in the meaning,sentence meaning and chapter.Various natural language processing systems such as machine translation,speech recognition,text classification,and automatic summarization cover the work of disambiguation.In order to make these systems more efficient,it is particularly important to improve the accuracy of disambiguation.The main research content of the paper are as follows:(1)Aiming at issues such as non-unique lexical semantics and imperfect semantic coding in different dictionaries,it will affect the application of deep learning to the field of word sense disambiguation,and the existing "Synonym Cilin" and "Modern Chinese Dictionary" corpus resources will be used for word meaning integration.This paper compares the polysemous words with different meanings in the two dictionaries,to find out more reasonable word meaning division,and provide polysemy vocabulary semantics and coding for word sense disambiguation research.(2)In order to solve the problem of lack of polysemous words,this paper proposes a method of automatic construction of polysemous words database based on pseudo-instance clustering algorithm.According to the "Modern Chinese Dictionary" combined with "Synonym Cilin" to find polysemous words and equivalent pseudo-words,combined with the unmarked corpus SogouCA news corpus data set,use equivalent pseudo-words to obtain enough pseudo-instances,and make sufficient preparations for the entire experimental stage jobs.The most critical part of this method is to use the improved clustering algorithm of the mixed leapfrog to cluster the pseudo instances multiple times to obtain the best clustering effect,and finally guide the pseudo by calculating the similarity between the sense term and each cluster The instances are correctly classified,and the pseudo instances are labeled with uniform labels.Through experimental comparison,the method proposed in this paper based on the pseudo-instance clustering algorithm to construct a corpus has an average accuracy of 74.9%,which is higher than the average of the second-order context vector,rule-based mixed feature method,and hidden Markov algorithm.Accuracy,high reliability and implementability,can effectively solve the problem of lack of corpus when using deep learning algorithms to deal with word sense disambiguation tasks.(3)this paper uses the summarized polysemy and extended corpus as the basis of the data set to overcome the problem of lack of corpus and inconsistent coding,and applies deep learning to word sense disambiguation tasks.The BERT-BiLSTM is used to construct a text word sense disambiguation model.The word vector trained by the BERT model is used as an input,and part of speech vector features are added.The BiLSTM neural network model is used to disambiguate word meanings.Experiments show that the use of BERT word vectors can better save the context information of text sequences and better learn the relationship between semantic features.On the same data set,the model disambiguation accuracy rate reaches 86.10%,and the disambiguation accuracy rate It is higher than the supervised word sense disambiguation model based on context translation,the BiLSTM model based on target word interpretation combined with example sentence information,and the DBN model based on part of speech and character form combined semantics. |