| With the fast development of Internet technology,people gradually begin to accept and are keen on online shopping.As the same time,people are accustomed to viewing product's reviews to decide whether to buy or not buy.These commentary data not only contain user interest and preference information,but also product information.Therefore,how to obtain valuable information through commodity review data has become an urgent problem to be solved.Firstly,the Amazon mobile phone review data was crawled by crawler technology.The data was used as research object in this study.Pre-processing and textualization of the crawled data were need to be done,including data cleaning,missing value processing,word segmentation,part-of-speech tagging,and removing stop words.The noun filtering and synonym merging operations were performed in the experiment to further narrow the screening scope of commodity feature words,and Latent Dirichlet Allocation model was used to obtain the commodity characteristic information and further retained characteristic words with more occurrences.Then,positive and negative emotional words in HowNet,Taiwan University NTUSD,Tsinghua University Li Jun Chinese Derogatory Dictionary and some unmarked sources dictionary were separately integrated.At the same time,it needed adding HowNet dictionary's adverbs.It became a eomplete emotional dictionary.In order to ealculate the emotional polarity of the commodity feature words,different weights were assigned to different words in the sentiment dictionary.After,the modeling data was reduced by using a principal component analysis algorithm.It was good for further preserving the useful information in the data.Noise data was also removed.Finally,the sentiment factor was added to multiple linear regression model,support vector machine regression and extreme gradient lifting algorithm to predict the sales volume of goods.In the experiment,sales were reflected by the ranking of commodity sales,so the experiment was to predict the ranking of sales.When using multiple linear regression models for prediction,the prediction results show over-fitting.When using support vector machine regression and extreme gradient lifting algorithm modeling,the method of k-fold cross-validation was also performed.The accuracy of forecasting results are further improved.What's more,over-fitting is further alleviated.Finally,R language was used to make a visual platform.Because the visualization of the R language is very good.It can show the results of the experiment more clearly and beautifully.It mainly used the R language's shiny to achieve. |