Font Size: a A A

Application Of RUSBoost Algorithm In Imbalanced Datasets

Posted on:2019-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:X T YinFull Text:PDF
GTID:2429330563958863Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
Imbalanced classification is a common problem in our life,such as disease diagnosis and financial fraud monitoring.However,many classification algorithms assume that the data distribution is balanced.When they classify imbalanced data,the classifier may be identified minority sample as noise.This results in a decrease in the identification of minority sample.When we can't get more minority samples,we can change the distribution of imbalanced data by different sampling technique or using the ensemble learning to focus on the prediction of misclassified instances.The RUSBoost algorithm used in this paper integrates under-sampling and ensemble learning.When the sample size is large,under-sampling can fully reflect its advantages.It makes the dataset balanced and improves the running speed of classification.In the empirical analysis stage,the financial ratios of bankrupt companies and health companies in the Polish manufacturing industry.The independent variables are 64 financial ratios.The assessment indicators are AUC,sensitivity and G-mean.First of all,a lot of comparative experiments were done on the results of RUSBoost and AdaBoost.After that,a 10-fold cross validation method was used to select a reasonable optimum value for the imbalanced ratio parameter and the cycle number of RUSBoost.Finally,the parameters were substituted into the algorithm to classify the financial data,and they all got good classification results.
Keywords/Search Tags:Imbalanced data, Under-sampling, financial data, ensemble learning, RUSBoost algorithm
PDF Full Text Request
Related items