| With the gradual development of Internet technology,online shopping has become indispensable in people’s life.As product information is difficult to distinguish between true and false,product reviews have become an important basis for consumers to make shopping decisions,and most consumers have been accustomed to reading product reviews before purchasing.The profit-seeking nature of merchants has led them to hire paid posters to post fake reviews to enhance the reputation of their products and services,or to lower the reputation of their competitors.Professionally trained fake reviewers can be extremely well hidden among ordinary users,and it is difficult to identify them quickly and accurately by manual methods.To solve this problem,this paper conducts a study on automatic feature extraction techniques for fake reviews.The main research contents are as follows:Firstly,the domestic and international techniques for false comment feature extraction and detection are summarized.And the existing manual feature engineering is reproduced on the Yelp Zip data set which is brought into multiple classification models for comparative analysis,and it used as a comparative baseline algorithm for automatic feature extraction models.Next,a certain time period is selected and the rating matrix is constructed using the ratings between reviewers and products in that time domain.Each rating data in the time period is emptied one by one and the Matrix Completion on Fake Review Detect model(MCD)is applied to automatically predict the ratings of all reviewers in the rating matrix.The confidence of reviewers is measured according to the deviation of ratings and the confidence sequence is obtained in ascending order.And the reviews in front of the reviews with lower confidence are detected as false reviews,and their authors are isolated false reviewers.In addition,in addition to detecting isolated false reviewers,the cheating behavior of the group of false reviewers should not be ignored.Inspired by the matrix complementation model and combined with the correlation of the cheating behavior of the group of false reviewers,this paper proposes an innovative automatic extraction model of false comment features based on Matrix factorization with Credibility(MFC).By introducing the confidence vectors T and H of reviewers and products in the model,and constructing the loss function based on the rating matrix and time period,the gradient descent method is used to automatically solve and obtain the reviewer feature vector U.Various machine learning classification models such as random forest,support vector machine and Gaussian Bayes are applied to obtain the most likely group of cheating reviewers.Finally,in the process of machine learning models classification,there is a problem of unbalanced data between the extracted fake comment features and the real comment features.Therefore,this paper adopts a down-sampling method to preprocess the automatically extracted feature vector data set.Multiple classifiers are used to fit the data separately and then the optimal model and the best down-sampled dataset are selected for subsequent analysis according to the classification evaluation index of the model.The results show that the feature vectors extracted by the automatic feature extraction model can effectively identify false reviews with good stability.Compared with features that generally rely on manual extraction,this model only relies on the correlation between user score data and cheating behavior,realizing efficient automatic feature extraction and improving the overall classification and detection performance.Compared with manual feature engineering,the value of evaluation index AUC is maintained at 56%,and F1 score is increased by 12%. |