Font Size: a A A

A Contrast And Analysis On Improving Classification Performance Of Imbalanced Dataset Based On SMOTEBoosting And Multiple Classification Algorithms

Posted on:2019-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:S W YinFull Text:PDF
GTID:2370330545497465Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
With the continuous progress of data collection technology and Internet technology,more and more unbalanced dataset classification problems need to be solved in such areas as fraud trading detection,network intrusion detection,web mining,direct marketing and medical diagnosis.In this paper,we pay attention to the improvement of classification performance of the imbalanced dataset.For the imbalanced dataset,on the basis of summarizing two traditional methods of assigning different weights to training samples and resampling the original dataset,Chawla et al.put forward a synthetic minority over-sampling technique(SMOTE)in 2002,and Shengguo Hu et al.put forward a improved form(MSMOTE)in 2009.These two methods are introduced to preprocess the dataset.Secondly,the boosting procedure is introduced for multiple classification algorithms to increase the weight of misclassified samples in order to improve the accuracy of classification.And the SMOTEBoosting model will be applied to the field of financial alert.The classification algorithms related to this paper include traditional statistical models such as classical logistic regression,linear discriminant analysis and some algorithm models in the field of machine learning,such as decision tree,k-nearest neighbor,and some emerging in recent years and attracted to the attention of the scientific community such as support vector machine and neural network.Finally.in the comparison and evaluation of the models,because the number of majority class in the unbalanced dataset is obviously higher than the minority class,even if a classifier marks all the samples as majority,it can still achieve high accuracy.Therefore.the use of the usual prediction accuracy is not appropriate.This article introduces evaluation indicators for the classification of minority class:precision,recall,F-score.Adding ROC curve with AUC value,this paper will carry out a contrast and analysis for the different classification performance of different imbalanced dataset.
Keywords/Search Tags:Imbalanced Dataset, SMOTEBoosting, Classification Algorithms, Financial Alert
PDF Full Text Request
Related items