Font Size: a A A

Sentiment Classification Of Microblog Based On Word Embedding And Convolutional Neural Network

Posted on:2018-07-12Degree:MasterType:Thesis
Country:ChinaCandidate:F F NiuFull Text:PDF
GTID:2428330515989695Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Based on the neural language model,word embedding technology can automatically learn effective text feature representation in language units such as words,phrases and sentences in large scale unlabeled text data sets,and has been used in many natural language processing tasks to make important progress.The research object of this paper is t:he sentiment classification of microblog text.The specific research contents are as follows:First,we explore the sentiment expression of emoticon and the differences between emoticon and sentiment word during the Chinese microblog sentiment analysis process.By collecting emoticons in the training set and sorting them according to the number of times of use,and emotionally marking them,it is found that the higher the frequency of the use of emoticons,the higher the likelihood of the sentiment indication,and the higher the likelihood of emotional clarity;Considering the difference of grammatical function between emoticon and the sentiment word in the sentence,the five most representative emoticons and sentiment words are selected for the four most commonly used emotions in the NLPCC data set,and we use the distribution of words to learn the semantic representation vector of emoticons.The vector spaces for both then use PCA,and the mapping relationship in the two-dimensional space is observed.It is found that the word embedding space has a certain degree of sentiment semantic differentiation ability,and the distinguishing degree of emoticons is higher.At the same time,the emoticons have a stronger ability to distinguish sentiment than sentiment words.Second,we propose a parallel word embedding convolutional neural network model combining Chinese characters and words to improve the sentiment classification effect of microblog.Because the Chinese microblog faces the difficulties of word segmentation and the high error rate,this paper explores the influence of the traditional machine learning model and the CNN model on the Chinese microblog sentiment classification when the Chinese characters and words are used as the language unit respectively.The experimental results show that the two advantages are effective and the combination of the two features is helpful to improve the classification effect.The experimental results show that the proposed model can significantly improve the accuracy rate of Chinese microblog sentiment classification by 1.72%to 2.64%,and relieve the "dimension disaster" of the feature space,compared with the MNB and SVM benchmark classifiers.Chinese characters and words as the basic unit of statistical characteristics,for these models have a certain impact,thus the effective integration of the two features can improve the Chinese microblog sentiment classification performance.
Keywords/Search Tags:microblog, sentiment analysis, emoticon, word embedding, convolutional neural network
PDF Full Text Request
Related items