Font Size: a A A

The Research On Sentiment Analysis Of Movie Reviews Based On Improved Word2vec And Ensemble Learning

Posted on:2022-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:S S LinFull Text:PDF
GTID:2505306548461404Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Currently,the research of text sentiment analysis involves many fields,and it has high research worth in social content and comment feedback analysis,which belongs to a very important and popular branch in NLP.However,when researching on ordinary texts,due to the universality of the samples,most of the trained models often fail to achieve the expected results when applied to texts in specific fields.When targeting texts in a specific field,it is often necessary to conduct in-depth research on text features.This process is generally more complex and the trained model is difficult to be generalized.At the same time,when choosing a classifier,researchers sometimes can only consider the effect of the classifier in a specific situation,which has some limitations in practical applications.Therefore,sentiment analysis algorithms that can simplify the expression of domain text features and enhance the accuracy of the model is a hotspot in this realm.Taking the field of movie review as an example,this paper aims to improve and innovate multiple aspects of text sentiment analysis in response to the above problems.The main research contents are as follows:(1)Aiming at the lack of available data sets and poor word segmentation in specific research fields,on the one hand,this article is based on the research of anti-crawler mechanism and adopts a combination of multiple strategies to effectively and quickly acquire data sets.At the same time,research is oriented to mixed movie reviews and ensure the uniform distribution of various samples to eliminate the influence of a single type of data and expand the applicability of the analysis system of this article.On the other hand,in the preprocessing part,three evaluation indicators are used to mine new words in the characteristic domain,which expands the word segmentation database and improves the accuracy of word segmentation.(2)Aiming at the cumbersome problem of manual data labeling,this paper uses the method of extracting new words in the feature field and using PMI to distinguish the emotional tendency of new words to expand the basic emotional dictionary,thereby constructing the emotional dictionary in the field of film review.Then,the data is automatically annotated by combining the weakly labeled information and the above-mentioned dictionary.As far as the original labeling algorithm is concerned,the algorithm proposed in the article increases the data adoption rate and expands the practicability.(3)Aiming at the goal of improving the ability of sentiment analysis in the field of film reviews,this paper proposes an sentiment analysis method based on improved word2 vec and ensemble learning.The basic design idea is to first improve the corpus involved in IDF and increase the core emotional words in the field to make IDF more in line with the actual situation;then,use the TF-IDF algorithm to exponentially weight the word2 vec,which can integrate the semantic relationship between words and the importance of vocabulary information into the model;finally,taking into account the idea of "integrating the strengths of different models" in ensemble learning and its excellent performance in various fields,this article uses the Stacking method to train and classify the labeled emotional data.At the same time,so as to check the reliability of the algorithm proposed in the article,a control group was set up for comparison experiments using multiple forms of weighting methods and a combination of SVM,Text Rank and other similar algorithms.The final experiments indicate that the above method performs good in all indicators,and the classification effect achieved is the best.In summary,the algorithms proposed in this paper effectively solve the above problems.From the construction of the data set to the specific annotation,from the feature extraction to the model training,the expected goals have been achieved.It not only simplifies the expression of field text features but also enhances the model classification performance,forming a set of fast and effective sentiment analysis system.
Keywords/Search Tags:Sentiment analysis, emotional dictionary, feature selection, machine learning
PDF Full Text Request
Related items