Adversarial User Recognition Based On Integrated SSL Under Internet Background

Posted on:2019-12-10

Degree:Master

Type:Thesis

Country:China

Candidate:W J Zhang

Full Text:PDF

GTID:2417330575953633

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

The rapid development of the mobile Internet has brought new challenges to the study of statistical learning.The massive data collected by smartphones in this context has three characteristics:high dimension,weak correlation,and massive unlabeled samples.Points are combined with limited labeled sample points.In response to these problems,this paper sorts out several main types of high-dimensional statistical classification algorithms,and combines the weak correlation between features and objectives,and focuses on a new non-parametric feature that can solve the problem of reduced dimensions of weakly correlated features.Feature Augmentation via Nonparametrics and Selection(FANS).This paper constructs a new integrated semi-supervised learning algorithm based on FANS algorithm and Logistic regression with a norm penalty term:antagonistic user recognition algorithm.The core mechanism of this method is to use the divergence of the two algorithms and the core advantages of FANS on cold start.Sequential training of the two algorithms will pick the sample with the highest confidence from the unlabeled sample set and give it The corresponding tag is added to the opponent's training data set.From the perspective of semi-supervised learning,adding pseudo-annotated data to the initial training set will inevitably introduce mislabeled noise.Therefore,there are bound to b e two con tradictory goals.On the one hand,it is hoped to add more pseudo-annotated data and expand training.The set,on the other hand,hopes to add fewer pseudo-labeled data to reduce the noise level introduced.The semi-supervised learning algorithm literature generally does not solve this contradiction.However,this paper explores historical literature and finds that the statistical learning method based on noise in the training set in the 1980s,the noise learning theory,can effectively solve this problem.Based on the noise learning theory,it can be evaluated whether the accuracy of each round of pseudo-label sample groups added exceeds the tolerable lower bound,and then provides guidance for the setting of hyperparameters.Finally,this paper implements a classification algorithm that can learn high-dimensional weakly correlated datasets with massive unlabeled samples�an antagonistic user recognition algorithm.The advantages of this algorithm are:First,it can improve the overall classification accuracy;Second,the high-confidence sample group can be identified from the unlabeled sample group,and the lower bound of the sample classification accuracy of the group can be given.Third,the hyperparametric adjustment strategy can be given based on whether the high-confidence group size converges.

Keywords/Search Tags:

Ensemble Learning, Semi-supervised Learning, High-Dimensional Statistics, User Identification

PDF Full Text Request

Related items

1	A Study On Risk Identification Of P2P Lending Platform Based On Semi-supervised Learning
2	Applied Research On Employment Of University Students Based On Semi-supervised Learning
3	Semi-supervised Classification Research Based On Self-paced Learning And Sparse Self-expression
4	Research On Automatic Summary Technology Of Patent Texts Based On Semi-supervised Deep Learning
5	Semi-Supervised Chinese Text Classification Based On Selective Integration
6	Research And Application Of A Semi-supervised Clustering To Student Dormitory Assignment And Improvement Based On Bayesian Statistics
7	Semi-supervised Clustering Algorithm Based On Single Linkage Clustering
8	Research On Accurate Identification Of Poor Students In Campus Based On Ensemble Learning Algorithm
9	Research On Application Of Employment Guidance For College Graduates Based On Improved Semi-Supervised Self-Training Method
10	Testing The Homogeneity Of Two High-dimensional Population Covariance Matrices