Font Size: a A A

The Research Based On Logistic Algorithm And Data Sampling Of Unbalanced Classification Data

Posted on:2018-10-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y XuFull Text:PDF
GTID:2347330536483959Subject:Statistics, application statistics
Abstract/Summary:PDF Full Text Request
With the development of economy and science technology,the current era is in the age of information explosion,the big data exist everywhere,and classification data is very common in our real life.But in classification problem,previous methods mostly focused on the classification problems of balanced data,such as linear discriminant analysis,quadratic discriminant analysis,support vector machine(SVM),Logistic model and boosting etc.These classification methods are based on the dataset of balanced classification,the overall high prediction accuracy rate is often attributed to the majority class,and ignoring the classification accuracy of the minority class.Therefore,the traditional classification methods cannot be directly applied to deal with the imbalanced dataIn view of the unbalanced binary classification problems,this paper put forward methods of logistic algorithm and data sampling two different aspects to improve the classification accuracy of the minority class,basing on the logistic model.Ordinary logistic classification usually choose alpha equal to 0.5 as threshold,in order to deal with unbalanced data,this paper puts forward to the adaptive threshold selection in order to increase the classification accuracy of the minority class.The idea of data sampling level is stratified sampling on the majority class and applied to logistic model,random forests,support vector machine(SVM),neural network etc machine learning method to classify for new several approximate balanced subsets generated.Finally,the methods proposed this paper are applied to analyze credit card default data and real data results confirmed that the methods proposed can effectively improve the classification performance of imbalanced data.
Keywords/Search Tags:Unbalanced data, Logistic model, Classification, Classification performance
PDF Full Text Request
Related items