Font Size: a A A

Sentiment Analysis Of Electricity Supplier Customer Reviews Based On LDA Topic Model

Posted on:2018-07-26Degree:MasterType:Thesis
Country:ChinaCandidate:R GuoFull Text:PDF
GTID:2359330518492924Subject:Project management
Abstract/Summary:PDF Full Text Request
With the vigorous development of the electronic business platform,the users are becoming more and more.This makes the number of the commodity assessment increase very quickly.How to get the relative information based on these meterials efficiently and precisely and study and analysis on this is facing a huge challege in the mordern information science and technology area.Electronic business platform can be a trade place including all kinds of commodities and the electronic.Especially with the universal Internet of things which brings a set of intelligent equipment such as wearing intelligent,wisdom home improvement and health care and so on,smart bracelet is one of the typical representatives.This paper uses the number of the assessment of users in the Jingdong mall products like the smart bracelet product and combines the machine learning and the relative theory and methods of the natural language processing.It also analyses and studies on language materials with the modeling emotional tendency of text and the theme of the text so as to propose more efficient and accurate technology and methods of the text digging.The techniques described in this paper can be largely used in electronic.This article provides technical guidance for text mining from the specific process of text mining,including data acquisition,text preprocessing,semantic analysis,sentiment classification,topic analysis and emphatically introduces the data acquisition,Natural Language Processing,algorithm selection and emotion classification etc.This paper first acquires comment corpus data of bracelet product in Jingdong mall through python,and obtains nearly 200 thousand user views text.Then it preprocesses data,eliminates the same content between lines,retaining only a comment statement to the text.It screens single word repetition,repetition of multiple words and repetition between clauses with the multiple traversal of words.The subsequent processing and deletion of low words:It sets screening length to delete reviews that are less than a preset value.Feature selection is mainly with the TF-IDF value as the feature selection,calculates TF-IDF value of the word,and compares the threshold with the set threshold to filter out the word below the threshold with the last remaining words as features.Corresponding weight value of features is the corresponding TF-IDF value.Semantic analysis is mainly about finding out the highest frequency words and then analyzing the meaning or relationship of these words to obtain the user evaluation or impression on the product from the user's important comments.As for sentiment analysis,this paper uses Python's NLTK,the Natural Language Processing and sklearn toolkit that includes classification algorithm to implement the training of classification model.It selects the feature value according to Chi-square statistics,and represents the corpus text in features.It constructs classifiers using different classification algorithms,and tests the accuracy.LDA model analysis mainly uses the Gensim package of Python.It divides the comment into word package,then generates dictionary,establishes corpus to turn text into sparse vector and specifies the number of topics to carry on LDA model of learning and training,and ultimately forms multiple positive and negative themes.From that we can intuitively know what aspects of goods to praise and what aspects of goods to criticize.After the completion of the model design,it continues experimental verification to compare accuracy,recall rate of different model.Results show that the prediction accuracy of the model meets the actual requirements.
Keywords/Search Tags:LDA topic model, Sentiment analysis, Semantic network
PDF Full Text Request
Related items