| With the development of the mobile Internet,mobile applications have become the primary way for people to get information.The app market is the primary way to download apps,and the app reviews contained in it are an important way for app developers to get user feedback and a significant reference for other users to decide whether to download the current app.However,there are currently many spam reviews in the application market.These spam reviews dilute the availability of review information under the application and may induce users to download malicious applications through false positive reviews,which seriously affects the regular order of the application market.There are two main problems in the existing research on spam review detection techniques.First,there is little research on the detection of irrelevant spam,and some current supervised models cannot adapt to the drastic changes in application and review characteristics in the application market.Second,the relationship features between entities in the application market are not well introduced in most fake spam detection work,and there is no feature design work specifically for the application market.To address these two issues,this paper focuses on the following work.1)Based on the actual application market data,we analyzed the abnormal reviews,and divided all reviews into four categories by two dimensions of relevance to the application and authenticity of the reviewers’motivation.Three of them are defined as spam reviews,and disassembled the spam review detection problem into two sub-problems of irrelevant spam detection and fake spam detection.2)A new weakly supervised irrelevant comment detection algorithm IrSD is proposed.This algorithm innovatively proposes an automatic sample labeling method,which uses the method based on topic model and seed anomaly review to automatically label some samples in the original dataset,and the accuracy of automatic labeling can reach more than 90%.And the review relevance detector obtained by weakly supervised training using automatically labeled data and unlabeled data can achieve 84.94%F1 score on the test set,which can greatly reduce the manual labeling workload.3)A new graph neural network-based algorithm for spam review detection,FaSD,is proposed.For the real scenario of the application market,two new node features are proposed for the first time,namely,the time distribution of user comments and the change trend of application keyword coverage.A time window-based neighbor sampling algorithm is also proposed in the FaSD algorithm to better aggregate the surrounding node features.Finally,the graphical neural network technology is applied to detect false comments in the application market,Experiments show that the proposed algorithm achieves a maximum F1 score of 90.22%. |