Font Size: a A A

A New Method To Solve The Imbalance Problem

Posted on:2020-09-19Degree:MasterType:Thesis
Country:ChinaCandidate:H Y MaFull Text:PDF
GTID:2428330572469691Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Classification is an important application of machine learning,and many of the classification problems in reality are unbalanced.For the imbalance problem,people generally solve it from the following three aspects,which are data preprocessing,correction algorithm and post-prediction processing.Among them,the SMOTE oversampling method is the most classical method in data preprocessing,and the RWO-Sampling algorithm is proposed for the deficiency of SMOTE.In this paper,the RWO-Sampling method is improved based on two aspects.One is to rely on the correlation coefficient matrix to divide the relevant variable pairs,and the other is to use the idea of variable clustering to divide the relevant variable into groups,which are the EROS method and VC-EROS method.After dividing the relevant variable into groups or related variable pairs,this paper derives the variable values of the new samples by generating random numbers from multivariate normal distribution.For the undivided variables,it is regarded as an independent variable,and the variable derivation principle is the same with the RWO-Sampling method.The VC-EROS method can adjust the upper limit of the number of variables in each group according to the actual situation.Many sets of simulation experiments are designed to verify the theory according to different degrees of correlation between variables and different distributions which the two types of samples follow.And this paper applies the new method to the actual data of medical insurance fraud identification.The results show that both the EROS method and the VC-EROS method effectively relax the null hypothesis of the RWO-Sampling method,and the VC-EROS method is more effective.In addition,this paper combines the VC-EROS method with the AdaBoost idea to form a new Data-sampling IDBoosting method which is VCEROS-Boost.It has been verified that the combination has a great effect on the accuracy of the classifier.
Keywords/Search Tags:Unbalance problem, Oversampling, Integration method
PDF Full Text Request
Related items