Research On The Statistical Machine Learning Method Of Mobile Phone APP False User Identification

Posted on:2020-06-03

Degree:Master

Type:Thesis

Country:China

Candidate:Y X Li

Full Text:PDF

GTID:2437330572479817

Subject:Applied Statistics

Abstract/Summary:

PDF Full Text Request

In the environment of rapid growth of internet users,especially the number of mobile internet users,mobile services have become an indispensable part of people's daily life,so the development prospects of mobile APPs are particularly impressive in this context.At the same time,this has also led mobile APP development companies to promote their own APPs through unreasonable and unethical ways,such as using fake users to refresh the rankings list.This way,real users become few and fewer,resulting in the problem of uneven distribution of mobile phone user data.In addition,for enterprises,it is difficult to identify the real user's feedback about mobile phone APP;For users,when downloading related APP,it will be affected by the fake ranking of the APP store and the download volume.It is very likely that users can not download the APP that they feel appropriate.This paper mainly predicts and classifies the authenticity of mobile phone APP user through statistical machine learning method,so that users and enterprises avoid unnecessary losses.The processing method of data imbalance is generally divided into two steps: the first step is data partitioning and the main methods are oversampling?undersampling and so on.The second step is the model method and the main methods are neural network?random forest?support vector machine and so on.The first carries on the data visualization analysis about each variable,analysis of the relationship between each variable and user authenticity.Then random forest and support vector machine modeling method based on cross-validation and under-sampling to predict user authenticity.Results show: the classifical prediction accuracy rate is above 95%,and the accuracy of small sample is also as high as 85%.The comparison model results show that the random forest model of undersampling is more suitable for the data in this paper.The prediction effect is good and stable.

Keywords/Search Tags:

Imbalance data, Support vector machine, Random forest, Cross validation, Undersampling

PDF Full Text Request

Related items

1	EEG Signal Classification Based On Iterative Random Forest Algorithm
2	Inverse Distance Weighted Support Vector Machine On High-Dimension Low-Sample Size Data And Class-Imbalance Data
3	Several Classification Algorithms And Their Applications In Statistical Learning
4	Analysis And Research Based On Multivariate Statistics And Machine Learning
5	Application Of Support Vector Machine In The Analysis Of Population Data
6	Neuron Classification Based On Their Morphological Features
7	A Research On Learning Process Evaluation Based On Support Vector Machine
8	Statistical Research On Identifying Transaction Risks Based On Consumption Process Data
9	Mining Web-based Learning System Data To Detect Different Pattern Of The Student During Completing Course
10	Research On High Dimensional Imbalanced Data Classification Based On Random Forest