An Imbalanced Data Classification Based On Improved SVM

Posted on:2016-12-12

Degree:Master

Type:Thesis

Country:China

Candidate:W R Zhang

Full Text:PDF

GTID:2348330536955074

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Imbalanced data classification is one of the important topics in machine learning field.How to identify the positive samples is the focus of the class classification because of the extremely high value of the positive samples.The traditional classification methods are usually conducted under the assumptions that the distribution of different classes are roughly balanced and the costs of different class misclassification are almost similar,and at the same time take the accuracy of the overall prediction as the standard for the performance of the classification.As a result,there are higher recognition rate about the negative samples and lower rate about positive ones,which lead to an undesirable classification.The negative samples are with the high recognition rate,while a low rate of positive ones,leading to an undesirable classification..An improved SMOTE imbalanced data classification method based on support degree(SDSMOTE)is proposed in this paper to solve the problem that the distribution of different classes are imbalanced,because the support degree is a significant reference to determine whether the sample is the boundary sample or not.In order to get the support degree of samples,firstly choose the synthetic samples with higher support degree as the two endpoints,then make the linear interpolation.The samples we got take full advantages of the boundary samples,which makes the fuzzy boundary clear.By comparing SDSMOTE with SMOTE,SDSMOTE is better in the aspects of synthesizing new valuable positive samples and accuracy of classification.Because the importance of the selection of parameters when support vector machines process the imbalanced data,this paper proposes an improved SVM imbalanced data classification method,which adopt an improved artificial bee colony algorithm to optimize SVM adopted to classification of imbalanced data.The improved artificial bee colony algorithm introduces the crossover idea of genetic algorithm,which really accelerates the convergence rate.The introduction of k-fold cross-validation method embedded clustering idea takes the full advantages of sample data and improves the stability of the algorithm.The experiments show that the improved SVM imbalanced data classification method has an obvious superiority in accuracy and F-value compared with the genetic algorithm.

Keywords/Search Tags:

SMOTE, Artificial Bee Colony Algorithm, SVM, Imbalanced Data Classification

PDF Full Text Request

Related items

1	Classification Learning Of Imbalanced Data Sets Based On Sampling Processing
2	Research On The Expansion And Classification Of Several Imbalanced Data Sets Based On C-SMOTE Algorithm
3	Research On Classification Methods For Imbalanced Data
4	Research And Application Of Imbalanced Data Classification
5	Research On Imbalanced Data Classification Methods For Industrial Big Data
6	Research And Application Of Imbalanced Data Classification Based On Oversampling Algorithm
7	The Study On Random-SMOTE For The Classification Of Imbalanced Data Sets
8	Research On Imbalanced Data Oversampling Classification Based On Constructive Covering Algorithm
9	Research On The Classification Of Imbalanced Data Sets Based On R-SMOTE
10	Research On Rotation Forest Algorithm For Imbalanced Data Classification Problem