Font Size: a A A

Research On Text Sentiment Analysis Method Oriented Towards Data Characteristics

Posted on:2021-03-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:1368330620463229Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Social media and e-commerce platforms provide hundreds of millions of users with convenient services in their work,life and social entertainment.Vast amounts of text data are scattered across various kinds of platforms.Among them,the user-generated text data contains rich sentiment information.Deeply analyzing and mining the sentiment information hidden in these data to serve social management and business operation is the most desired goal of text sentiment analysis technology.Text sentiment analysis refers to the process of extracting,processing,analyzing and reasoning the sentiment information in text data by synthetically utilizing the technologies of natural language processing and machine learning.The text data in social media are numerous and complex.Observation and statistics to them show that the social media data have such characteristics as category imbalance,lack of labeled data,obscure sentiment expression and diverse granularity of sentiment carriers,which bring great challenges to text sentiment analysis.Focusing on the data characteristics mentioned above,centering on the issues of text sentiment classification,identification of rhetorical sentences,identification of irony and explainable recommendation modeling,this thesis aims to carry out systematic and in-depth research to develop the theory and method of text sentiment analysis by synthetically utilizing data sampling,semi-supervised learning,embedded representation,deep learning and other technologies.The main research contents and innovations are as follows:(1)Local dense mixed region down-sampling + global rebalancing for imbalanced text sentiment classificationImbalanced text sentiment classification: The sentiment category imbalance of the review data in social media may lead to the bias of the classification model,which will affect the classification performance of the model.This thesis proposed an imbalanced text sentiment classification method based on local dense mixed region down-sampling and global rebalancing(LDMRC+SS/RS).In this method,a undirected complete graph composed of the minority class samples is constructed in a locally dense boundary region,and local balance is achieved by cutting majority class samples that closest to the edges of the complete graph,and the core down-sampling algorithm LDMRC is designed.On this basis,the global rebalancing of the data is carried out by using SMOTE up-sampling(SS)or random down-sampling(RS).The experimental results on 8 Chinese and English imbalanced datasets show that LDMRC is superior to BRC in various evaluation measures,and LDMRC+SS/RS is generally superior to LDMRC algorithm,which verifies the effectiveness of the proposed method.(2)Cooperative hybrid semi-supervised learning for text sentiment classificationText sentiment classification with the insufficiency of labelled data:The lack of high-quality labelled data poses a serious challenge to supervised learning.This thesis proposed a cooperative hybrid semi-supervised learning method for text sentiment classification.The new concepts of the cluster similarity of a sample,the uncertainty and reliability of a sample with respect to a learner are proposed,which are used to measure the characteristics of a sample.A method for selecting the initial seed set is presented by using the cluster similarity and clustering technology,which can preserve the distribution consistency of the initial seed set with whole data in a certain extent.The concepts of uncertainty and reliability of a sample with respect to a learner provide the measure basis for pseudolabel sample selection and to ensure the extended quality of the training data set.The designed heterogeneous cooperative rotation iterative training strategy is beneficial to the construction of a better ensemble classifier.A series of experiments carried out on 8 Chinese and English data sets verified the effectiveness of the proposed method.(3)Rhetorical question identification based on automatic language feature acquisitionRhetorical question identification:To solve the problem of automatic feature extraction,an automatic feature extraction model(Auto F)based on Bi-LSTM and attention mechanism is proposed.This model uses Bi-directional Long Short-term Memory(Bi-LSTM)to embedding a sentence,and then pays attention to the words of the sentence through the sentence’s label,which is conducive to obtaining the word features that can reflect the context information and describe the characteristics of rhetorical questions.Aiming at the problem of rhetorical question identification,an identification model(Auto F+AOA)based on the feature information fused to the sentence by Attention-over-attention(AOA)attention mechanism is proposed.The model uses the stacked attention mechanism AOA to fuse the feature sequence information of the target sentence into its embedding representation,which is used for rhetorical question identification.Compared with the existing methods,multi-group comparison experiments on Weibo data show that the proposed method can significantly improve the identification effect of Chinese rhetorical questions.(4)Irony identification based on multi-information fusionIrony identification: Aiming at the problem of explicit feature extraction,an extraction model based on Bidirectional Encoder Representation from Transformers(BERT)and label attention is proposed.The model uses BERT to embedding a sentence,pays attention to the words in the sentence with the sentence sentiment label,and then selects the words,phrases,punctuations,symbols that gain larger attention weights as explicit features.Aiming at the information extraction problem of sentiment polarity reversal,in basis of the sentiment classification of the target sentence and its surrounding sentence(clauses and previous sentences),a method of describing the information of sentiment polarity reversal based on the difference of the sentence representations is proposed.Aiming at the irony identification problem,an information fusion method based on the stacked attention mechanism AOA and vector splicing is proposed.The final representation of the sentence which fuses the feature information,environmental sentiment information is used for irony identification.The comparative experiments on the data sets IAC and Reddit movies show that the proposed method is significantly superior to other existing methods.(5)Explainable commendation based on sentiment analysis of aspect termsExplainable commendation: In the product review data,aspect items can serve as a medium for building relationships between users and products.Aiming at the sentiment information extraction problem of aspect terms,by using the sequential labeling model constructed on BERT,the sentiment information of aspect terms in the review data can be automatically obtained to serve the subsequent user-product relationship modelling.Aiming at the problem of user-product relationship representation,the user preference degree to an aspect term and the reputation contribution degree of an aspect term to a product are defined for describing the user preference and product reputation from the perspective of aspect term.An associated bipartite graph is then proposed based on the user-product relationship.To enhance the reasoning ability of the recommendation system,a graph neural network updating strategy based on the aspect item’s attention mechanism to users or products is proposed.Finally,a stable user-product correlation bipartite graph neural network is trained and obtained by constructing the loss function of the model based on the user’s rating of products.The experiment on the data set Restaurant shows that the explainable recommendation model based on aspect term sentiment analysis can make full use of the sentiment information of the aspect terms in the product review data to improve the interpretability of recommendation results.
Keywords/Search Tags:Text sentiment analysis, Rhetorical implicit sentiment, Data sampling, Semi-supervised learning, Explainable recommendation
PDF Full Text Request
Related items