Research On Class Imbalance Learning Methods Based On Data Prior Distribution

Posted on:2024-08-29

Degree:Master

Type:Thesis

Country:China

Candidate:S Q Wu

Full Text:PDF

GTID:2568307154997859

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the advent of the big data era and the continuous development of scientific and technological advances,artificial intelligence technology has gradually become an indispensable and important part of daily life.The seemingly mature artificial intelligence technology is widely used in various aspects of life,and it can be said that these technologies have brought great convenience to people in the information industry.However,in order for AI technology to have more widespread and extensive applications,it also faces more challenges.In classification tasks,when faced with complex and imbalanced data,conventional basic models cannot accurately adapt,so developing robust learning algorithms is of great research significance.The essence of classification is to mine high-dimensional semantic information of data and divide samples with the same attribute into the same category.Existing classification models can achieve relatively high accuracy performance.However,when faced with imbalanced data,conventional learning models based on empirical risk minimization are easily affected by the prior distribution of samples.In other words,biased data distribution can lead to biased decision spaces,which in turn affects the robustness and generalization of the model.Most current research focuses on adjusting the learning machine using methods such as pre-sample balance sampling,in-cost balance learning,and post-decision compensation when dealing with imbalanced data.From the perspective of data prior distribution,this article conducts research on methods for learning from imbalanced data at both the algorithm and sample levels.The contents are as follows:(1)Algorithm level: We propose an improved Probability Density Machine(PDM)algorithm based on shared nearest neighbor clustering technology.Probability Density Machine is a new algorithm recently proposed to solve the problem of class imbalance learning.The algorithm can capture prior data distribution information well and demonstrate robust performance in various Class Imbalance Learning(CIL)applications.However,we also notice that PDM is sensitive to CIL data with varying density and/or small separations.To address this problem,we introduce the non-parametric Shared Nearest Neighbor(SNN)clustering technology into the PDM process and propose a new SNN-PDM algorithm.In particular,SNN can adapt to changing densities and capture small separations well.We evaluated the proposed algorithm on a large number of CIL datasets,and the results show that the proposed SNN-PDM algorithm is significantly better than PDM and several previous methods.(2)Sample level: A feature-level interpolation generation(FIG)method is proposed to address class imbalance problems.Many studies have shown that deep models also face the challenge of class imbalance,and how to effectively augment data for minority classes in the image domain has always been a problem.The traditional SMOTE oversampling method generates new samples by directly interpolating in the original space to alleviate the overfitting caused by random oversampling.However,in image data,this augmentation method cannot effectively improve the performance of deep models.To solve this problem,we propose a feature-level interpolation generation(FIG)method.The main idea is to transfer the SMOTE interpolation from the input space to the encoding space of the autoencoder.We hope that the improved encoding can provide better guidance for generating different images.

Keywords/Search Tags:

Class imbalance, Prior distribution, Image classification, Image generation

PDF Full Text Request

Related items

1	Research On Class Imbalanced Image Classification Algorithm Based On Sample Mixing Technolog
2	Research On Threshold Strategy For Class Imbalance Decision-Making Considering Prior Distribution Of Samples
3	Research On A Class Of Natural Image Classification Algorithm Via Transfer Learning
4	Research On Data Imbalance In Visual Tracking
5	Study On Class Imbalance Problem In Multi-Lable Image Classification
6	Study Of Class Imbalance Learning Based On Extreme Learning Machine
7	Research On Class Semantics And Imbalanced Distribution Methods For Multi-Label Text Classification
8	A Data Augmentation Method For Image Class Imbalance Problem Using Generative Adversarial Networks
9	Generative Adversarial Network Image Generation Method Based On Dimensionality Reduction Theory
10	Unbalanced Data Sampling Based On Sample Prior Distribution Information