Research On Data Augmentation Algorithm For Imbalanced Data Classification

Posted on:2024-03-19

Degree:Master

Type:Thesis

Country:China

Candidate:Y J Sun

Full Text:PDF

GTID:2568307064986079

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The problem of imbalanced data classification has become increasingly common and urgent in the era of big data.In traditional machine learning,classification problems usually assume that the dataset is balanced and seek accurate classification results.However,when the number of majority class samples is much larger than that of minority class samples,the classifier tends to learn the features of the majority class,resulting in the features of the minority class being ignored,which affects the performance of the classifier.Therefore,correctly identifying minority class samples to ensure that the final classification performance has a balance of multi-class recognition has become one of the research focuses in classification problems.In reallife scenarios,accurately identifying minority class samples has an important impact on decision-making,such as rare disease diagnosis in the medical field,fraud detection and customer churn prediction in the financial field.In addition,considering other evaluation metrics,such as precision and recall,is also necessary to obtain comprehensive and accurate evaluation results when evaluating classifier performance.Therefore,correctly handling imbalanced data classification problems is very necessary and meaningful in the era of big data.This article proposes a hybrid resampling algorithm for imbalanced data classification problems for structured data,which combines oversampling based on conditional generative adversarial networks and undersampling based on distance screening.By migrating the generative adversarial network from unstructured data to structured data and using the conditional generative adversarial network to generate new data of a specified class,and adding generated samples that are within a certain distance threshold to the original imbalanced training set,the imbalance level is reduced,and the imbalanced data classification problem is solved.Experimental results show that the algorithm performs better than 12 other imbalanced data classification methods on 37 imbalanced data sets,proving the feasibility of the algorithm’s migration to the field of structured data.In addition,this article also investigates the problem of imbalanced data classification existing in the field of biology.Because of the characteristics of biomics data with small number of samples and large number of features,the algorithm RCGAN-DF is prone to overfitting when training on data sets with large number of features,so this article also proposes a data augmentation and feature selection algorithm based on mi RNA omics.The algorithm utilizes the important role of mi RNA in gene regulation,combines the relationship between mi RNA and target genes in gene expression data,and uses an effective method to enhance the dataset and select meaningful features for classification,and the reduction of the number of features is beneficial to reduce the overfitting of the training model.Experimental results show that the algorithm performs better than traditional data augmentation and feature selection algorithms on three mi RNA datasets,and identifies significant biomarkers,proving the feasibility of the algorithm in the field of bioinformatics.This algorithm can provide powerful support for disease diagnosis and treatment,and provide new ideas and methods for future bioinformatics research.

Keywords/Search Tags:

imbalanced data classification, conditional generative adversarial networks, deep learning, microRNA, feature selection

PDF Full Text Request

Related items

1	Research And Implementation Of Social Robot Detection Technology Based On Improved Conditional Generation Adversarial Network
2	Imbalanced Data Classification Analysis Based On Generative Adversarial Networks And Reinforcement Learning
3	Research On Imbalanced Image Classification Method Based On Generative Adversarial Networks
4	Research On Imbalanced Data Classification Method Based On Generation Model And Its Application
5	Study And Application On The Structure Improved Deep Convolutional Generative Adversarial Networks
6	Skin Segmentation Based On Conditional Generative Adversarial Networks And Face Color Classification
7	LiDAR Data Urban Terrain Classification Based On Deep Learning
8	Research On Classification Of Imbalanced Dataset Based On Generative Adversarial Networks
9	Research On Game Background Stylization Algorithm Based On Generative Adversarial Network
10	Research On Speech Enhancement Model Based On Conditional Deep Convolutional Generative Adversarial Networks