| Classification is a kind of labeled machine learning,which belongs to one of the supervised learning.Class imbalanced datasets and datasets with labeled noise are often encountered in classification.Class imbalance causes the classification results to be biased towards the majority class,with poor recognition accuracy for the minority class.Label noise causes a shift in the decision boundary,reduces the prediction performance of the model,and increases the complexity of the model.When label noise is present in imbalanced dataset,it can have a large negative impact on the classifier.Sampling is a way to solve imbalanced datasets by increasing the number of minority class samples or decreasing the number of majority class samples to balance the dataset,but it usually also increases the number of noisy samples or loses data information.In this theis,we address the problem of classifying unbalanced datasets with label noise in two main aspects:1.Existing sampling methods usually have some limitations,so this theisi introduces the concept of granular spheres.In this theisi,we propose a general sampling algorithm that is not limited to any specific dataset,specific classifier and specific scenario,called granular sphere oversampling.The decision boundary of the dataset is fitted by continuously dividing the spheres,and outlier samples are located outside the spheres.Then the purity of each sphere is calculated,and the noisy samples are located in the less pure spheres.Finally,the data set is balanced by oversampling inside the spheres.Experimentally,it is shown that the noise immunity of granule oversampling is better in unbalanced high-noise datasets.2.In this theisi,a generalized weighted oversampling framework is proposed.The number of majority classes in the K-nearest neighbors of each minority class sample is calculated to assign weights to each minority class sample.Then the interpolation position of each synthetic sample is precisely specified so that the synthetic samples are closer to the safe and clean samples and away from the dangerous samples.This general weighting framework can be combined with multiple oversampling algorithms.Experiments show that different oversampling algorithms outperform the original oversampling algorithm when combined with this framework. |