Font Size: a A A

Research On Resampling Ensemble Algorithm For Imbal Anced Credit Data

Posted on:2023-11-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y J YuanFull Text:PDF
GTID:2558307097979109Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of financial technology,online loans with simple borrowing procedures and fast disbursement are more popular than traditional loans when capital turnover is tight.Although the development of P2P(Peer-to-Peer)loans in the past decade has exposed many problems,the demand for online loans has continued unabated.Once the borrower has dishonest behavior,the inte rests of the owner of the loan funds will be seriously damaged.To reduce the risk of damage to the interests of the owners of loan funds,it is very important to establish an effective credit risk assessment model.Building an effective credit risk assess ment model mainly faces the problem of unbalanced credit data.In the credit score data,the non-default data is far more than the default data,and there is a serious data imbalance.Most of the existing methods to solve the imbalance of credit data aim a t improving the classification effect of defaulters,ignoring the classification effect of non-defaulters.Credit loans are profit-oriented,and the decline in the classification performance of non-defaulters will lead to the loss of customers with good cr edit.In addition,the existing methods to solve the data imbalance mainly consider the skew of the number of samples without paying too much attention to the spatial distribution of the data,which makes the samples near the decision boundary more likely to be misclassified.Given the above problems,the research in this paper is as follows:Aiming at the problem that the existing method to solve the data imbalance is to sacrifice the performance of the majority class to improve the performance of the minority class,this paper proposes an ensemble model based on one-class and binary classifications(EMOCB)for credit scoring.The model uses resampling techniques of bagging sampling and random undersampling to improve the problem of data imbalance and uses one-class classification to learn the majority class samples again to ensure the performance of the majority class.Finally,Bagging is used to ensemble the one-class and binary-class classifiers to improve the final classification effect.The effectiveness of EMOCB is verified on the Lening Club,Prosper,and Paipaidai datasets.EMOCB can improve the performance of the minority class while ensuring the performance of the majority class.Aiming at the problem that samples near the class decision boundary ar e more likely to be misclassified than samples far from the boundary in classification problems,this paper proposes a resampling ensemble model based on the class boundary(REMCB).The model uses boundary sample division to focus on learning samples near the decision boundary,followed by Kmeans-based random undersampling technology to select representative samples,then employs conditional table generative adversarial network to generate minority class samples to solve data imbalances problem,and finally uses Bagging ensemble to improve the classification effect.The effectiveness of REMCB is verified on the Lening Club,Prosper,and Paipaidai datasets.Compared with other methods,the REMCB model has better overall performance.
Keywords/Search Tags:Credit scoring, Data imbalance, One-class classification, Resampling method, Class decision boundary, Ensemble learning
PDF Full Text Request
Related items