Font Size: a A A

Comments Sentiment Analysis Based On Improved Word2vec

Posted on:2019-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y LiangFull Text:PDF
GTID:2428330545972910Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Word embedding technology is a breakthrough in the field of deep learning in Natural Language Processing(NLP).The word embedding technology is to transform words into vector representations.Word vectors are widely used in various NLP tasks.Word2vec is an efficient tool for Google to represent the word as a real value vector in 2013.A word is the smallest unit of text that carries semantic information.English words are made up of letters,and Chinese words consist of Chinese characters.Based on the method of English word representation,Chinese word representation is proposed a character-enhanced word embedding method.The research shows that the word vector obtained from the semantic information of Chinese characters is successful in some NLP tasks.However,the existing model has some limitations in considering the semantic contribution of Chinese characters to the words so that the learning of the word vectors isn't satisfactory in some NLP tasks.Aiming at this problem,this paper proposes an attention-based character-enhanced word embedding model(ACWE),which uses the extend version of TongYiCi Cilin to calculate the semantic similarity between words.The experimental results show that the word vector obtained by this method is superior to the existing baseline model in terms of semantic relevance of words.Moreover,the improved Word2vec has been applied to Sina micro-blog reviews' emotional analysis task and has achieved good results.The specific work of this paper is as follows:1.This paper proposes an incremental learning method for word embedding.So that the model doesn't need to retrain all the corpus in the face of the new content of the corpus,only doing the update caused by the new data.2.This paper proposes an attention-based character-enhanced word embedding model(ACWE),use Cilin to calculate the degree of semantic contribution between words.It is proved by the semantic correlation experiments that the method proposed in this paper is superior to the existing baseline model.3.The improved Word2vec is applied to the sentiment analysis of microblog reviews,and the effectiveness of the improved Word2vec is verified by experiments.
Keywords/Search Tags:Incremental learning, Attention mechanism, Cilin, Word2vec, Emotion analysis
PDF Full Text Request
Related items