Study On Label Noise In The Classification

Posted on:2020-02-29

Degree:Master

Type:Thesis

Country:China

Candidate:Q Gao

Full Text:PDF

GTID:2417330602452462

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

Label noise is an important issue in classification,which makes many potential negative consequences.One of typical harms is decreasing the accuracy of predictions.Recently,existing literature on label noise include two main methods: algorithm level approach,mainly aims to design some robust supervised learning algorithms which are little affected by label noise,and data level approach which focuses on identifying and removing mislabeled data or correcting misclassified data.However,the methods,based on the algorithm level approach,are modified with traditional machine learning algorithms,which is lack of versatility.There are some advantages for the data level approaches.The process of dealing with label noise is separated from training classifiers.Moreover,most researchers think the dealt data can be applied to more data situations.There are two major methods in the data level approach.They are noise removing and noise correcting.Compared with the noise removing method,the noise correcting method is a good choice.On the one hand,some important information may be lost when removing directly noise data.On the other hand,removing noise data may be prohibitively expensive when the cost of collecting data is high.Thus,this work is concerned about the correcting of label noise.First,estimating the label noise rate in the data can provide more useful information for label noise correction.We propose a method of estimating noise rate since most of the existing methods for estimating label noise rate are only applicable to binary classification problems.This method aims to identify potential label noise data so that it can supply more beneficial information for the process of correcting label noise.This process consists of three steps.To start with,we use the k NN classifier to derive probability estimates for each instance in the data set by using leave one out cross validation.Then finding thresholds to detect anomalous instances.The thresholds are the mean probability estimates of all examples in the same class.In the end,counting the number of potential incorrectly labeled instances and compute its percentage of all instances.This algorithm not only deals with binary classification but also multi-classification.Second,existing label noise correction algorithms often adopt one of supervised learning method and unsupervised learning method.However,the two methods have different concerns about data.Fully combining the characteristics of them,it can provide more useful information for label noise correction,and thus improve the accuracy of label noise correction in data.Therefore,this paper designs a label noise correction algorithm which combines supervised learning with unsupervised learning.Specifically,this algorithm is based on K-means algorithm and k NN algorithm.The proposed correction method executes one or more times for clustering on a training set.Then using majority voting rules to estimate instances' label and combined with noise rate estimates.And we derive the confidence of labels in the data.Finally,according to the confidence,and using majority voting rules between clusters,the labels of training data are corrected.In this paper,to evaluate the performance of the proposed algorithm,we have chosen some criteria.There are label accuracy,model quality and AUC.Extensive experimental results using real-world data sets are provided.The empirical study shows that,compared with several correction methods,our approach successfully corrects the noise label and improve data quality in many cases.And it makes the classifier achieve higher prediction accuracy.

Keywords/Search Tags:

Label noise, Noise correction, Noise rate estimation, Classification

PDF Full Text Request

Related items

1	Influence Comparison Of Continuous Noise And Intermittent Noise On Children's Behavior
2	Research On The Dynamic Behavior Of Several Typical Nonlinear Systems Excited By Random Noise
3	Research On The Phenomenon Of Inverse Stochastic Resonance Under Levy Noise And Autapse
4	Resonance Dynamics Of Neurons And Neuron Networks Under The Action Of Phase Noise And Electromagnetic Induction
5	A Study On The Influence Of Learning Motivation, Illumination And Noise On Learning Action Of Students In Physical Education
6	Research On Signal Detection And Fusion Based On Chaotic Noise Background
7	The Effect Of Moderate Exercise On Plasm IL-6,8 And TNF-α In Rats After Noise Stress
8	Education Policy & Environmental Justice: Noise Pollution, Education Performance Indicators and the Perpetuation of Socioeconomic Statu
9	Chronic Noise Exposure Causes Persistence Of Tau Hyperphosphorylation And Formation Of NFT Tau In The Rat Hippocampus And Prefrontal Cortex
10	Acoustic Model Training Based On Data Noise And Text-speech Alignment