Font Size: a A A

An Imbalanced Data Classification Based On Improved SVM

Posted on:2016-12-12Degree:MasterType:Thesis
Country:ChinaCandidate:W R ZhangFull Text:PDF
GTID:2348330536955074Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Imbalanced data classification is one of the important topics in machine learning field.How to identify the positive samples is the focus of the class classification because of the extremely high value of the positive samples.The traditional classification methods are usually conducted under the assumptions that the distribution of different classes are roughly balanced and the costs of different class misclassification are almost similar,and at the same time take the accuracy of the overall prediction as the standard for the performance of the classification.As a result,there are higher recognition rate about the negative samples and lower rate about positive ones,which lead to an undesirable classification.The negative samples are with the high recognition rate,while a low rate of positive ones,leading to an undesirable classification..An improved SMOTE imbalanced data classification method based on support degree(SDSMOTE)is proposed in this paper to solve the problem that the distribution of different classes are imbalanced,because the support degree is a significant reference to determine whether the sample is the boundary sample or not.In order to get the support degree of samples,firstly choose the synthetic samples with higher support degree as the two endpoints,then make the linear interpolation.The samples we got take full advantages of the boundary samples,which makes the fuzzy boundary clear.By comparing SDSMOTE with SMOTE,SDSMOTE is better in the aspects of synthesizing new valuable positive samples and accuracy of classification.Because the importance of the selection of parameters when support vector machines process the imbalanced data,this paper proposes an improved SVM imbalanced data classification method,which adopt an improved artificial bee colony algorithm to optimize SVM adopted to classification of imbalanced data.The improved artificial bee colony algorithm introduces the crossover idea of genetic algorithm,which really accelerates the convergence rate.The introduction of k-fold cross-validation method embedded clustering idea takes the full advantages of sample data and improves the stability of the algorithm.The experiments show that the improved SVM imbalanced data classification method has an obvious superiority in accuracy and F-value compared with the genetic algorithm.
Keywords/Search Tags:SMOTE, Artificial Bee Colony Algorithm, SVM, Imbalanced Data Classification
PDF Full Text Request
Related items