Font Size: a A A

Hot Spot Mining Of User Comments Based On Text Clustering

Posted on:2021-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:X ChenFull Text:PDF
GTID:2439330623970049Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
In the era of big data,text data is growing rapidly every day.The need to build a method system that can extract text value information efficiently is highlighted.This paper attempts to apply text mining technology to notebook sales,through the mining of users' comments on notebook computers,in order to find the main focus of users when they choose different brands of notebook computers.The research of this paper is mainly divided into three parts.The first part is data acquisition and data preprocessing.The second part is empirical analysis based on the core algorithm.The third part is the summary and Prospect of this paper.In the part of data acquisition and data processing,this paper first discusses and introduces the current commonly used web crawler technology,including general web crawler,theme web crawler and incremental web crawler and so on.By using Python crawler technology,the comments related to notebook computer in Jingdong Mall are crawled down,A series of text preprocessing related work,such as text data cleaning,text segmentation,stop words,high frequency word statistics,text space vector representation,has been completed successively,which has made sufficient preparation for the follow-up empirical analysis work.In the empirical analysis part of the text data,firstly,the user comments are analyzed by descriptive statistics.On this basis,the user comments of Huawei and apple are extracted by constructing LDA theme model.The results of LDA theme extraction show that for Huawei user reviews,the five themes extracted are respectively hardware configuration,logistics,customer service,appearance value and the comprehensive performance of notebook computers.Among them,Huawei users have the most comments on mall logistics and laptop hardware configuration,that is,they pay more attention to these two topics,accounting for 36.19% and 31.82%respectively;for Apple users,the five topics selected are system,customer service,hardware configuration,logistics and appearance.Among them,the number of user comments related to the subject of the system is the largest,accounting for 36.88%.On the basis of topic extraction,this paper constructs a Gaussian mixture model to cluster the comments of Huawei and Apple users on the topic of computer hardware configuration.The results show that for Huawei users,the hot topics involved in computer hardware configuration mainly include computer display screen,body shell,mouse keyboard,battery and so on.Among them,users pay the highest attention to the design of Huawei's notebook display screen,with relevant commentsaccounting for 43.46%;for Apple users,the comments on the subject of computer hardware configuration mainly involve the display screen,mouse keyboard,battery and sound quality of sound card,among which the comments on notebook display screen are the most,and the number of comments under this category accounts for the most The ratio reached 39.49%.Finally,this paper summarizes and prospects the research results,and puts forward some suggestions for notebook manufacturers,Jingdong Mall and potential consumers.
Keywords/Search Tags:Web crawler, User comments, Laptop, Topic model, Cluster analysis
PDF Full Text Request
Related items