Font Size: a A A

Research On The Statistical Machine Learning Method Of Mobile Phone APP False User Identification

Posted on:2020-06-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y X LiFull Text:PDF
GTID:2437330572479817Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
In the environment of rapid growth of internet users,especially the number of mobile internet users,mobile services have become an indispensable part of people's daily life,so the development prospects of mobile APPs are particularly impressive in this context.At the same time,this has also led mobile APP development companies to promote their own APPs through unreasonable and unethical ways,such as using fake users to refresh the rankings list.This way,real users become few and fewer,resulting in the problem of uneven distribution of mobile phone user data.In addition,for enterprises,it is difficult to identify the real user's feedback about mobile phone APP;For users,when downloading related APP,it will be affected by the fake ranking of the APP store and the download volume.It is very likely that users can not download the APP that they feel appropriate.This paper mainly predicts and classifies the authenticity of mobile phone APP user through statistical machine learning method,so that users and enterprises avoid unnecessary losses.The processing method of data imbalance is generally divided into two steps: the first step is data partitioning and the main methods are oversampling?undersampling and so on.The second step is the model method and the main methods are neural network?random forest?support vector machine and so on.The first carries on the data visualization analysis about each variable,analysis of the relationship between each variable and user authenticity.Then random forest and support vector machine modeling method based on cross-validation and under-sampling to predict user authenticity.Results show: the classifical prediction accuracy rate is above 95%,and the accuracy of small sample is also as high as 85%.The comparison model results show that the random forest model of undersampling is more suitable for the data in this paper.The prediction effect is good and stable.
Keywords/Search Tags:Imbalance data, Support vector machine, Random forest, Cross validation, Undersampling
PDF Full Text Request
Related items