| With the rapid development of the Internet,online shopping has become a mainstream way.But while bringing convenience,it also brings a lot of problems.For example,how to buy value-for-money goods,how to buy high-rated goods,However,due to the rise of the "Internet navy" industry,it is difficult for users to find useful information from it,Can’t accurately judge the quality of the product,At the same time,because everyone has different scoring standards for products,This will easily cause large errors,which will affect the user’s judgment.Therefore,it can be analyzed based on product reviews to optimize users’ purchasing decisions.At the same time,sellers can optimize their selling strategies through product review analysis.But for a product with a large number of reviews,manual analysis is time-consuming and labor-intensive.Currently,supervised-learning is usually used for text sentiment analysis,but supervised-learning requires a large amount of labeled data.In the review sentiment analysis applications of major Internet companies,the cost of manual labeling is too high,so it is necessary to combine relatively low-cost unlabeled data for sentiment analysis.Therefore,for the sentiment analysis of product reviews,this article uses a large number of unlabeled and a small number of labeled data sets to train the optimal model and apply it to the product review analysis system.The main research contents of this paper are as follows:1.This research uses RPA technology for data acquisition,which is simpler,safer and more efficient than traditional data crawling methods.The experimental data comes from Taobao Mall,making the data for research more real.At present,there are a large number of spoken and online language in product reviews,and stammering cannot be well recognized.Based on the comment data of Taobao Mall,this article uses a new word discovery algorithm for new word discovery.At the same time,it is added to the stuttering custom vocabulary together with the Alibaba product vocabulary in the Sogou cell vocabulary-clothing,shoes and hats vocabulary,and finally uses the custom The thesaurus performs word segmentation to make the result of word segmentation more accurate.2.The core of the product review analysis system is the algorithm module.Through the study of semi-supervised learning methods,this paper selects the selftraining method in semi-supervised learning,and implements a variety of different base classifier self-training algorithms,and use them at the same time The diversity and high-presence estimation method further filters the pseudo-labeled data after the prediction of unlabeled sample data.The experimental results show that the diversity and high-presence estimation method improves the accuracy of the model,and the traditional incremental self-training The algorithm is comprehensively compared with the self-training algorithm of diversity and high-value estimation,and finally the selftraining algorithm of diversity and high-value estimation is selected as the algorithm of the product review analysis system.3.This article constructs a complete product review analysis system,which can be divided into two modules,the administrator function module and the user function module.The main function of the management function module is to manage user information and machine learning model information.The main functions of the user function module include registration,login,and product review analysis.For product review analysis,a review data needs to be submitted,and the system will return the results of the analysis for the user Make reference to optimize users’ buying and selling decisions. |