Font Size: a A A

Research On Cross-domain Sentiment Analysis Methods Towards Review Texts

Posted on:2023-08-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y P FuFull Text:PDF
GTID:1528306845497114Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the development of information technology,all-round network services and numerous mobile applications have provided a convenient way for people to express their opinions on the Internet,which has resulted in a large number of review texts containing sentiment.How to make machines understand the semantics of review texts and judge their sentiment tendencies,so as to provide a basis for upper-level applications such as search ranking,product recommendation,transaction decision-making,and social governance,has become a hot issue of common concern in academia and industry.In practical application scenarios,review texts are often oriented to different domains.These reviews in different domains are aimed at different review objects,and their expression language and emotional vocabulary are different.Sentiment classification models in a single domain are not suitable for direct sentiment prediction of review texts in different domains because they do not distinguish these domain differences.Research on cross-domain sentiment analysis methods has become a key task to achieve intelligent text processing and application.Although some progress has been made in existing cross-domain sentiment analysis methods,these methods still have many limitations due to the cross-domain semantic differences and the complexity of sentiment expression.For example: when representing cross-domain word vectors,language models lack the ability to encode semantic and emotional discrepancy information between different domains;the coarse-grained assessment method of domain discrepancy leads to inaccurate measurement of domain differences,and it is difficult to eliminate domain offsets;The lack of use of sentiment words in texts from different domains leads to the lack of sentiment information in sentiment transfer;the lack of model optimization for data quality in different domains leads to excessive negative transfer problems,etc.This thesis is oriented to review texts in different domains,and studies the crossdomain sentiment analysis methods according to the existing problems.It studies from four aspects: the representation method of cross-domain sentiment word vector,the domain adaptation method in cross-domain sentiment transfer,the utilization method of sentiment information between different domains,and the data quality optimization method in different domains.The corresponding model is constructed.This work was supported by several funds,including the National Natural Science Foundation of China(No.62173026),the National Key Research and Development Program of China(No.2018YFC0831306)and the Graduate Innovation Fund(No.2019YJS006).The main research work and innovations of this thesis are as follows:1.We study the influence of sentiment discrepancy,domain discrepancy and grammatical discrepancy in cross-domain review texts on the representation of word vectors,and propose a cross-domain sentiment word vector learning method that integrates multiple knowledge.General language models based on the maximization of co-occurrence probability often do not have the ability to distinguish the sentiment polarity of words.The vectorization of emotion words is also unable to distinguish the sentiment discrepancy across domains.In view of the above problems,we study the method of vectorizing words in different domains.The model mines the deep relationship between words and context by combining language models and graph convolutional neural networks,and combines the given sentiment knowledge in the source domain with the syntactic and semantic knowledge of the context to generate the source domain word embeddings with rich knowledge;Secondly,we propose a weighted transfer method to transfer the sentiment information of the source domain to the target domain,and fuse the syntactic and semantic information of the context in the target domain to generate the target domain word embeddings with rich knowledge.Experiments show that the cross-domain sentiment word embeddings generated on the Amazon product review dataset can better reflect the cross-domain sentiment information than the existing word embeddings,and can effectively improve the classification performance of the model when used for cross-domain sentiment classification.2.We study the domain adaptation method for the cross-domain sentiment classification task,and propose a domain adaptation model based on shrinkage discrepancy strategy.The dispersion of sample features in the domain will increase the number of samples around the classification decision boundary,which leads to misclassification.Coarse-grained domain distance calculation methods ignore the differences of different categories sample between domains,which leads to rigid domain discrepancy evaluation and cannot solve the problem of domain offset.In view of the above problems,we first propose the shrinkage subspace strategy from the feature structure in the domain,which reduces the samples distributed around the decision boundary by shrinking the feature subspace of different types of samples in the two domains,so as to improve the misclassification problem caused by sample dispersion.Secondly,we propose a weighted domain discrepancy strategy,which adopts the dynamic discrepancy evaluation method of optimized transmission,so that the measurement of domain difference is no longer a fixed domain distance calculation,but changes adaptively and dynamically with the optimization process of classification model.Experiments show that,compared with traditional domain adaptation methods,this model can more accurately eliminate domain offset and effectively achieve cross-domain sentiment transfer.3.We study the extraction of sentiment words and the utilization of sentiment information in cross-domain texts,and propose a cross-domain sentiment classification model integrating key pivots and non-pivots.Sentiment words carry key sentiment information in cross-domain texts and directly affect the sentiment polarity of review texts.There is no effective method for how to extract and utilize sentiment words in different domains.In response to the above problems,we study the extraction methods of domain-shared sentiment words(pivots)and domain-private sentiment words(non-pivots).By using the relationship between sentiment labels and pivots,and the relationship between pivots and non-pivots,we propose the KPE-net structure and NKPE-net structure to extract the pivots and non-pivots in different domains;According to the language characteristics of sentiment words,we find the key sentiment words that play a decisive role in sentiment polarity from pivots and non-pivots,and propose a method to generate emotional factors by converting key sentiment words into sentiment vectors;Combining sentiment factors and the features of texts,we construct an sentiment-sensitive network model by using hierarchical attention network,which realizes the effective utilization of sentiment information in cross-domain sentiment transfer.Experiments show that the model can outperform the benchmark models in the prediction accuracy of the target domain,and the model can give an intuitive explanation for the transfer information across domains.4.We study source domain selection methods and multi-source sentiment transfer methods in multi-source review texts,and proposes a multi-source cross-domain sentiment classification model based on multi-source selection and contrastive transfer.Among the multiple available source domains,there may be some domains that are irrelevant or contrary to the target domain,and these domains directly lead to the wrong transfer to the target domain.Using the same sentiment transfer model from multiple source domains to target domains will also bring about different degrees of negative transfer problems.In view of the above problems,we propose two different multi-source selection strategies according to the different degrees of correlation between the source domain and the target domain to screen out the source domain with strong correlation for sentiment prediction,so as to eliminate the effects of irrelevant or adverse source domains;We propose a contrastive transfer model using contrastive learning and domain adaptation methods,which transfer the sentiment information from each source domain to the target domain by the feature-driven method;According to the different correlations between selected multiple source domains and target domains,we propose a weighted classification mechanism to transfer sentiment information from multiple source domains to the target domain to varying degrees.Experiments show that the model can select some of the most relevant source domains and can obtain higher multi-source cross-domain sentiment classification accuracy than the baseline models.
Keywords/Search Tags:Cross-domain, Sentiment Analysis, Word Embedding, Domain Adaptation, Multi-source Selection, Deep Learning
PDF Full Text Request
Related items