Font Size: a A A

Applied Research In Sentiment Analysis With Machine Learning Model Based On Gradient Acceleration

Posted on:2024-08-24Degree:MasterType:Thesis
Country:ChinaCandidate:S W LiuFull Text:PDF
GTID:2568307073976669Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
In order to solve the problem of how to solve the model faster while ensuring a high accuracy of model learning in the large-scale sentiment analysis scenarios,from the sentiment analysis application of large-scale Twitter comment text this thesis has accomplished the following three tasks:For the task of text vectorization of English comment text on Twitter,after collating the Twitter sentiment analysis dataset from Kaggle website,first remove the part of non-English letters by text cleaning,then achieve text pre-processing through tokenization,stemming,lemmatization,and finally finish the work of feature extraction and feature selection.In this thesis,text feature vectorization is started from bag of words model,N-gram algorithm is added on this basis,and the binary word segment model with N=2 is selected according to the effect,then TF-IDF technology is used to weight the frequency of word segments,and the top 2000 word fragments with the highest importance are selected to form a key gram list as the final text feature.For the task of solving machine learning models for large-scale sentiment analysis using gradient acceleration algorithm,three machine learning models,Logistic Regression,Support Vector Machine,and Naive Bayes,are selected according to the characteristics of large-scale sentiment analysis.On the basis of the stochastic gradient descent algorithm,based on two acceleration strategies for gradient direction,namely momentum acceleration and variance reduction,this thesis combines the advantages of the two to improve the minibatch gradient descent algorithm and proposes a mini-batch gradient variance reduction algorithm with momentum acceleration,and applies it to the Twitter comments sentiment analysis tasks.The application results show that the improved mini-batch gradient acceleration algorithm effectively reduces the number of iterations and accelerates the learning process.For the task of comparing the performance and time cost of different machine learning models based on gradient acceleration algorithms in practical applications,three machine learning models are selected to be trained separately in this thesis after randomly dividing the training and testing sets in the ratio of 8:2.The results show that the SVM model has the best fitting effect both in training and test sets,but its training time is also the longest;the Bernoulli NB model has the shortest training time,but the worst fitting effect among the three models;the LR model balances the training time and the fitting effect,and both the training time and fitting effect are between them.Therefore,this thesis concludes that LR model is the most suitable for application in Twitter comment sentiment analysis.
Keywords/Search Tags:Large-scale sentiment analysis, Machine learning, Gradient acceleration, Improved mini-batch gradient acceleration algorithm
PDF Full Text Request
Related items