Word vector is a distributed representation of words.It maps words into a continuous dense vector of a certain length.This representation method can effectively and flexibly preserve prior knowledge information.By integrating it into specific tasks,it can achieve optimal results in many fields of natural language processing.Semantic similarity can quantitatively measure the similarity between two words or concepts,which is the basis of natural language understanding and widely used in tasks related to natural language processing.This paper analyses the semantic similarity from the perspective of word vector,and presents a multi-sense word vector training model to improve the semantic similarity between words and words,sentences and sentences.Because the traditional word vector training models do not distinguish the different sense of polysemous words,a word can only be represented by a vector,which cannot solve the problem of polysemy or synonym.Multi-sense word vectors can solve the problem of semantic confusion in vector representation of polysemous words by mapping different sense of words into different word vectors.In this paper,we use word sense disambiguation technology to preprocess Wikipedia dataset to obtain annotated corpus which can distinguish different semantics of polysemous words.Then,we use the improved word vector model to train multi-sense word vector representation and apply it to the calculation of sentence semantic similarity.The specific research contents include:(1)Word sense disambiguation.In order to distinguish the different meanings of words in sentences,this paper proposes a word sense disambiguation model based on bidirectional recurrent neural network.By using bidirectional LSTM to capture the contextual word features of polysemous words can improve the traditional word sense disambiguation model.In the construction of the model,we introduce a state composition mechanism based on attention model to provide better disambiguation features.The experimental results show that our model achieves optimal results in datasets Semeval2007 and Semeval2013.(2)Multi-sense word vector representation and computation of their similarity.This paper proposes a multi-sense word vector representation method based on GloVe model.By using the related technology of word sense disambiguation,the multi-sense words in the corpus are tagged to obtain an annotated corpus which can distinguish different meaning of polysemous words.Then,the multi-sense word vector representation under different semantics is trained by annotated corpus.The method of training multi-sense word vectors proposed in this work solves the problem which is the semantic confusion of polysemous words in traditional word vector training methods.Finally,we give the experimental results of the model on adjacent words and word similarity datasets.The results show that our training model can distinguish word vector representations under different sense of polysemous words,and has achieved good results on word similarity datasets SCWS.(3)Sentence similarity calculation based on multi-sense word vector.In addition to using traditional recurrent neural network to calculate sentence semantic similarity,this paper presents a method of using multi-sense word vectors in the sentence similarity calculation.First,we present the calculation method of sentence similarity based on Siamese neural network.Then,we propose two methods simple semantic averaging and attention mechanism model to integrate multi-sense word vectors with LSTM.Finally,we present the experimental results of multi-sense word vector representation and traditional word vector representation in sentence similarity calculation. |