Font Size: a A A

Design And Application Of A Sentiment Analysis System Based On BERT And K-Means

Posted on:2024-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:Z C YanFull Text:PDF
GTID:2568307106989879Subject:Electronic information
Abstract/Summary:PDF Full Text Request
With the rise of emerging social media such as Weibo and Zhihu,more and more platforms are paying attention to the social attributes of the Internet,forming social media sites such as Douban,and integrating works of art such as books and movies with social networking as a new trend.In this context,sentiment analysis,as an important research field in Natural Language Processing(NLP),has attracted the attention of researchers.Sentiment analysis can help understand people’s emotional tendencies and attitudes towards specific topics,events,products or services,etc.This is of great significance to enterprises,government agencies,marketers,public opinion analysts,and social science researchers.For example,in the commercial field,sentiment analysis can help companies better understand customer needs and market trends,thereby improving products and services,increasing customer satisfaction and brand value.Government agencies can use sentiment analysis to understand public attitudes and reactions to policies and government services,so as to better formulate and implement policies and improve government credibility.Public opinion analysts can use sentiment analysis to monitor and analyze public opinion dynamics on social media,and discover and respond to issues of public concern in a timely manner.Social science researchers can use sentiment analysis to study the laws and changes of human emotion,psychology and behavior.However,there are always some challenges that need to be solved in the sentiment analysis task,such as:(1)Due to the lack of context information,the traditional word vector embedding has incomplete semantics.When encountering text with more complex expressions,the model cannot accurately judge the part of speech and word meaning;(2)Traditional topic modeling methods do not perform well when dealing with short text data,where lexical sparsity is higher and contextual information is less.This makes traditional methods less effective in topic modeling on short text data.At the same time,as the data dimension increases,the similarity between vectors becomes difficult to distinguish,which reduces the effect of the text similarity algorithm;(3)Generally speaking,the coarse-grained sentiment at the article and sentence level is the main research of traditional sentiment analysis methods,usually by comprehensively analyzing the text.Get overall sentiment trends.However,this approach is difficult to meet users’ expectations for customization,especially when the text involves multiple evaluations with different emotional polarities.Focusing on the above problems,the main work of this article is as follows:1.Data collection and preprocessing: In order to perform sentiment analysis on user comments,we first need to collect and preprocess data.This can be achieved by using publicly available datasets or by manually annotating and crawling comments for specific products or events.Using information extraction,Chinese Jieba segmentation technology,and stop word tables,the text data is made more suitable for sentiment analysis.2.Using BERT for word embedding: To overcome the limitations of traditional word embeddings,such as polysemy and lack of contextual information,we use the BERT pre-trained model to generate word vectors.We compare the performance of word vectors extracted from different levels of the BERT model and choose the best solution.In addition,we study the properties of BERT word embeddings and their applicability in clustering tasks,proposing the use of dimensionality reduction algorithms to improve the performance of high-dimensional vectors in clustering algorithms and improve the accuracy of similarity calculations.To solve the uneven distribution of word vector space and the inconsistent similarity between high-frequency words and low-frequency words,we propose using whitening transformation to improve clustering accuracy.3.Using clustering algorithms for topic modeling and sentiment analysis: We explore various clustering algorithms to perform topic modeling and sentiment analysis on the generated word vectors.By comparing our method with traditional probabilistic topic models(such as Latent Dirichlet Allocation,LDA),we aim to demonstrate the feasibility and superiority of the proposed model in practical applications.The evaluation metrics use accuracy,precision(P),recall(R),and F1 values.The experimental results on different datasets show that our proposed model outperforms baseline methods in various metrics,proving its effectiveness.4.Implementing a user review sentiment analysis system: We design and develop a user-friendly sentiment analysis system that integrates the models proposed in the third step.The system is built using popular front-end development frameworks and offers users an intuitive review analysis experience through features such as comment scraping,data preprocessing,and result visualization.In summary,this paper addresses the challenges in sentiment analysis by collecting and preprocessing data,using BERT for word embedding,applying clustering algorithms for topic modeling and sentiment analysis,and developing a user-friendly sentiment analysis system.Through a series of experiments and evaluations,the proposed models have shown their effectiveness and superiority over traditional methods,providing valuable insights and tools for various stakeholders in the fields of business,government,marketing,public opinion analysis,and social science research.
Keywords/Search Tags:BERT, Topic Modeling, Sentiment Analysis, Text Clustering, Similarity
PDF Full Text Request
Related items