Font Size: a A A

Research On Text Sentiment Classification Based On Language Model And Machine Learning

Posted on:2018-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:Q ZhangFull Text:PDF
GTID:2348330542951472Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
In the era of self media brought by micro-blog,the text sentiment recognition is a hot topic in the field of Natural Language Processing.At present,research on text emotion recognition is a hot topic of a comprehensive cognitive science,physiology,psychology,linguistics,computer science,Natural Language Processing and many other disciplines,is attracting more and more domestic and foreign research institutions and researchers’ attention.The main contents and innovations of this paper are as follows:(1)This paper expounds the research background and significance of the text sentiment classification,summarizes the current research status at home and abroad,and explains the theoretical and technical problems which need to be further studied and solved.(2)This paper summarizes some basic knowledge related to emotion,including the definition of emotion and the classification of emotion.Secondly,we introduce the text preprocessing technology:Chinese word segmentation,stop word filtering,part of speech tagging.Then,the author expounds several methods of constructing the affective lexicon:the construction of the dictionary of the emotional words,the construction of the polar adverb dictionary,and the construction of the dictionary of the facial expressions.Finally focus on the text of common features:bag of words feature,text vector feature,frequency characteristic,frequency-inverse document frequency characteristics;and the feature extraction methods:Chi square test,information gain,mutual information selection.(3)This paper proposes to language models as features of text sentiment classification of text sentiment classification,using n-gram language model as an example to introduce the construction of statistical language model and the common smoothing algorithm,then introduces three kinds of classification algorithms in machine learning field:the nearest neighbor classification algorithm,Naive Bayesian classification algorithm based on support vector machine classification algorithm.Then we use 3Kinds of Weibo corpus to test which feature and which method is better.(4)This paper proposes optimization algorithm of language model features in sentiment classification and simplified algorithm:add text and sentence features,put forward the topic weighted language model,two new features are the recognition results have been improved;the necessity of word clustering,and proposes using Word2Vec clustering to simplify the word language model feature number.(5)This paper introduces the basic principle of deep learning and the learning rule of neural network and the common model,and proposes to use the method of text sentiment classification based on convolutional neural network and recurrent neural network,RNN performs better than conventional machine learning recognition effect in 3 kinds of micro-blog comments corpus,the final solution selection problem by several parameters of the neural network with the experiment:when the learning rate selection 0.001,the RNN module with 128 hidden layer nodes number,node number of hidden layer feedforward module with 128,network has better performance in convergence speed,stability,recognition effect.
Keywords/Search Tags:text sentiment recognition, text feature extraction, language model, Word2Vec, deep learning, convolutional neural network, recurrent neural network
PDF Full Text Request
Related items