Font Size: a A A

Anti-cheating In Crowdsourcing Data Collection Scenarios Model Design And Implementation

Posted on:2020-09-12Degree:MasterType:Thesis
Country:ChinaCandidate:Q L XiaoFull Text:PDF
GTID:2428330620451072Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Data crowdsourcing is often used in data acquisition for various kinds of life services apps.Such apps that are currently active on the market include Penguin Maps,Meituan Shops,etc.At present,many users cheat through various means to get payment from the platform,which brings great challenges to the operation of this kind of acquisition mode.This topic is derived from the anti-cheating department of a crowdsourcing data acquisition app.With the low cheating cost and high audit cost of this kind of acquisition mode,starting from the demand of increasing the intensity of cracking down on user cheating,an anti-cheating model based on user behavior characteristics was established.The main work of this paper is as follows:1.Based on the behavior characteristics of cheating use rs,a cheating user identification model is established to automatically mine such cheating users.At the same time,according to prior knowledge,a monitoring and early warning system based on clustering analysis is established,which can distinguish some abnormal user groups and feedback them to the auditing department.The anti-cheating system is composed of the cheating user identification model and the monitoring and early warning system.2.In the cheating user identification model,this topic establi shes a classification model based on the dimension of the behavior characteristics of single user according to the recycled user-related data.Due to the insufficient accumulation of historical data and the small number of cheating users in the sample,in order to solve the problem of class imbalance in the sample,the sample of cheating users is oversampled to increase the balance of the sample.Due to the model trained on the sample based on the random oversampling method tends to degrade the generalizati on ability of the model,in this paper,the method of over-sampling based on SMOTE algorithm and the method of over-sampling based on ADASYN algorithm are proposed respectively to reconstruct samples.By comparing the model-related evaluation indexes obtained by oversampling the samples by these two methods,the oversampling method based on SMOTE algorithm is finally selected.3.Traditional classification methods have poor classification effect on unbalanced samples,this paper quotes CART and random forest algorithm which have better classification effect on unbalanced samples to train the model.Then,we compare the relevant evaluation indicators of two models in the test set,and choose a cheating user identification model based on random forest algorithm.4.In Monitoring and Early Warning Model,according to the business-related prior knowledge,the corresponding user behavior characteristics are selected,clustering analysis is made on these behavior characteristics.Due to the problem that the traditional K-Means algorithm randomly chooses the initial centroid,which makes the model converge too slowly or locally,a K-Means++ algorithm based on the improved K-Means algorithm and an algorithm based on cohesive hierarch ical clustering are proposed to construct the monitoring and early warning system.The system can mine some abnormal user groups and feed them back to the auditing department,thus helping the auditing department to find some new cheating methods or some a bnormal operations.The system can automatically identify cheating users with known types of cheating and monitor and warn the users' abnormal operation behavior,thus solving the problem of low cheating cost and high audit cost faced by this data acquisi tion mode.After Anti-cheating system on line,the model effectively cracks down on user cheating and improves the efficiency of data acquisition,and reduces the cost of operation and audit.
Keywords/Search Tags:Anti-cheating, Cluster analysis, Sample imbalance, Decision tree, Random-Forest
PDF Full Text Request
Related items