Font Size: a A A

Adversarial User Recognition Based On Integrated SSL Under Internet Background

Posted on:2019-12-10Degree:MasterType:Thesis
Country:ChinaCandidate:W J ZhangFull Text:PDF
GTID:2417330575953633Subject:Statistics
Abstract/Summary:PDF Full Text Request
The rapid development of the mobile Internet has brought new challenges to the study of statistical learning.The massive data collected by smartphones in this context has three characteristics:high dimension,weak correlation,and massive unlabeled samples.Points are combined with limited labeled sample points.In response to these problems,this paper sorts out several main types of high-dimensional statistical classification algorithms,and combines the weak correlation between features and objectives,and focuses on a new non-parametric feature that can solve the problem of reduced dimensions of weakly correlated features.Feature Augmentation via Nonparametrics and Selection(FANS).This paper constructs a new integrated semi-supervised learning algorithm based on FANS algorithm and Logistic regression with a norm penalty term:antagonistic user recognition algorithm.The core mechanism of this method is to use the divergence of the two algorithms and the core advantages of FANS on cold start.Sequential training of the two algorithms will pick the sample with the highest confidence from the unlabeled sample set and give it The corresponding tag is added to the opponent's training data set.From the perspective of semi-supervised learning,adding pseudo-annotated data to the initial training set will inevitably introduce mislabeled noise.Therefore,there are bound to b e two con tradictory goals.On the one hand,it is hoped to add more pseudo-annotated data and expand training.The set,on the other hand,hopes to add fewer pseudo-labeled data to reduce the noise level introduced.The semi-supervised learning algorithm literature generally does not solve this contradiction.However,this paper explores historical literature and finds that the statistical learning method based on noise in the training set in the 1980s,the noise learning theory,can effectively solve this problem.Based on the noise learning theory,it can be evaluated whether the accuracy of each round of pseudo-label sample groups added exceeds the tolerable lower bound,and then provides guidance for the setting of hyperparameters.Finally,this paper implements a classification algorithm that can learn high-dimensional weakly correlated datasets with massive unlabeled samples—an antagonistic user recognition algorithm.The advantages of this algorithm are:First,it can improve the overall classification accuracy;Second,the high-confidence sample group can be identified from the unlabeled sample group,and the lower bound of the sample classification accuracy of the group can be given.Third,the hyperparametric adjustment strategy can be given based on whether the high-confidence group size converges.
Keywords/Search Tags:Ensemble Learning, Semi-supervised Learning, High-Dimensional Statistics, User Identification
PDF Full Text Request
Related items