Font Size: a A A

Research On Classification Of Imbalanced Data Based On Convolutional Neural Network

Posted on:2022-03-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q HuangFull Text:PDF
GTID:2518306740962559Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Imbalance problems are common in various application fields,such as medical diagnosis,text classification,and fault monitoring.The traditional classification method does not take into account the imbalance of the data,which will lead to unsatisfactory classification effect of unbalanced data.Under the current big data background,as a new research direction of machine learning,deep learning has achieved remarkable results in data mining and other fields.The representative algorithm convolutional neural network has the ability to characterize learning and is an efficient data mining tool.In the face of unbalanced problems,when convolutional neural networks are used for classification tasks,if the training process is adversely affected by unbalanced factors,the classification accuracy of minority classes will be reduced.In addition,when evaluating the performance of classification algorithms,not all evaluation indicators are suitable for imbalance problems.In response to the above problems,this paper uses the convolutional neural network as a training model,combines oversampling and ensemble learning algorithms with the convolutional neural network,and improves the loss function of the convolutional neural network,and studies the imbalance from the data level and the algorithm level.The solution to the problem.The main research work is summarized as follows:1.Aiming at the problem that the traditional oversampling method is prone to produce noisy data,an oversampling method DPCSMOTE,which combines the density peak clustering algorithm and the SMOTE algorithm,is proposed.The method first uses the density peak algorithm to cluster the input data into clusters,then filters out the clusters that need to be oversampled,and determines the corresponding number of synthesized samples,and finally uses the SMOTE algorithm for oversampling.The effectiveness of the method is verified by experiments on different data sets.In addition,for the imbalance problem,an evaluation index OFm based onis proposed.The OFm evaluation index not only pays more attention to the recognition rate of minority classes from the perspective of misclassification cost,but also considers the influence of the recognition rate of majority classes from the perspective of confusion matrix,so it is more suitable for imbalance problems.Experimental results show that the OFm evaluation index can more effectively and comprehensively evaluate the performance of classification algorithms in unbalanced classification tasks.2.Aiming at the problem of the performance degradation of unbalanced data classification by convolutional neural network,the cross-entropy loss function of convolutional neural network is improved,and a cost-sensitive loss function FCELoss is proposed to apply to the training process of convolutional neural network.By giving different weights to different categories in the unbalanced data,and considering the Euclidean distance between the predicted value of the model output and the correct label,different cost losses are given to the majority class and the minority class,so that the loss function pays more attention to the minority class.Thereby improving the recognition rate of the convolutional neural network model for minority classes.Experiments on data sets with different unbalanced ratios show that the proposed method can improve the classification performance of the convolutional neural network model for unbalanced data.3.In order to improve the recognition rate of minority classes in unbalanced problems,an ensemble learning method based on convolutional neural networks is proposed.This method first divides the unbalanced data set into multiple balanced training subsets to train the base classifier.In the process of training the base classifier,gradually reducing the weights of the correctly classified minority samples and all the majority samples in the training subset,which is equivalent to increasing the weight of the wrongly classified minority samples,so that the minority of these incorrectly classified The class samples receive more attention in the follow-up training,so as to achieve the purpose of improving the minority class recognition rate.Minority samples in unbalanced problems are often more costly to misclassify.Therefore,using the above method to train the base classifier can make the misclassified minority samples get more attention during the training process,so as to improve the classification model's recognition of minority classes.Recognition rate.The experimental results verify the effectiveness of the algorithm.
Keywords/Search Tags:Imbalance classification, Density peak, SMOTE algorithm, Convolutional neural network, Cost-sensitive loss function, Ensemble learning
PDF Full Text Request
Related items