| With the rapid growth of e-commerce,people’s demand for online shopping is increasing,which has become a trend.Online shopping cannot provide the same experience as offline shopping where customers can see and experience the products in person.Therefore,consumers often rely on product reviews to judge the quality of the products,making the authenticity of product reviews extremely important.From practical experience,there are often fake reviews hidden in product reviews that can mislead consumers.False reviews that are beneficial to the seller can increase product sales,while false reviews that are detrimental to the seller can decrease product sales.In order to discover fake reviews hidden in product reviews,many researchers have made efforts and achieved significant results,but there are still some issues that need to be addressed.These issues mainly include:(1)Fake reviews are becoming increasingly sophisticated in their disguise,and more new features are needed to identify them.(2)The accuracy of fake review identification needs to be improved.In response to the above issues,this article obtains official annotation data from Yelp website,excavates new attributes from it,optimizes the decision tree model based on Gini coefficient index,and uses the optimized model to select features from attributes and verify their effectiveness.Based on a graph neural network model,effective features are used to identify false comments.The main work is as follows:(1)Attribute mining based on text analysis: We crawled hotel reviews,hotel information,and user information from the Yelp shopping website in the Chicago area of the United States.The data was preprocessed to obtain clean data,and the crawled text data was segmented.Then,hierarchical clustering and classification were performed to identify behavioral attributes and content attributes.(2)To alleviate the problem of imbalanced data,we propose the ROS-S algorithm that combines random oversampling with stratified sampling.(3)Use a decision tree model to select effective features from attributes.(4)The conclusion of feature and relationship disguising in the dataset of this article is drawn through the experiment of disguising evidence,and then combined with the graph neural network model(CARE-GNN)to identify false comments.(5)Comparison experiments showed that the classification performance on the dataset optimized by the ROS-S algorithm was significantly improved.The innovation of this thesis lies in:(1)Mining data attributes based on text analysis technology and selecting effective features based on the Gini coefficient decision tree.(2)The ROS-S algorithm is proposed to alleviate the problem of imbalanced data.Based on this algorithm,it is loaded into decision tree and graph neural network models.Experimental comparisons were conducted to verify the effectiveness of the proposed ROS-S algorithm.The algorithm ensures that the proportion of samples in different categories in the training set and test set remains consistent after data splitting.The algorithm also increases the amount of data in the training set through random oversampling,and ultimately improves the performance of the models. |