Font Size: a A A

Prediction Of Overdue Loan Based On Imbalanced Data

Posted on:2020-07-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q XieFull Text:PDF
GTID:2428330575986349Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the aspect of personal credit loan of Internet finance,we can evaluate the credit goodness of users,allocate appropriate products to users,carry out anti-fraud,and reduce the generation of bad debts by data tracking.In business analysis,profit and risk are proportional,and risk pricing is based on profit maximization.Rules and models can quantify a customer's credit.However,in practice,not only is there a large amount of high-dimension data,but also the data is imbalance.Because the number of overdue customers is always small compared with the number of non-overdue customers,the impact of these problems on the credibility of the model should be taken into account in the process of modeling.Many traditional algorithms tend to majority classes,such as the nearest neighbor algorithm(K Nearest Neighbor Algorithm).In the final decisionmaking process,a few classes are subordinate to the majority,and there are misjudgments.As a result,the probability of minority class are correctly identified becomes lower.Accuracy is not suitable to measure the quality of the model on imbalanced data prediction.Secondly,there are many dimensions of data such as user consumption information,telecommunication information and multi-platform lending information obtained from the Internet.It is also important to make feature selection in highdimensional data prediction.Aiming at the imbalance and high dimension of Internet financial data,this paper focuses on the data processing and algorithm processing of imbalanced datasets.Python is used to analyze the data set of GiveMeSomeCredit(imbalanced data set)contest based on Kaggle contest platform to clarify the advantages and disadvantages of various imbalanced data processing methods.Choose an imbalanced data set processing method suitable for Internet financial loan overdue forecast.Based on the competition data held by an Internet financial company as experimental data,business understanding,data preprocessing,feature screening and derivation,modeling and evaluation are carried out.A framework of overdue prediction model is proposed,which is implemented in accordance with CRISP-DM,a standard data mining process.Combined with feature selection,a variety of algorithms with Boostng and Bagging ideas are used together.Finally,the model is fused.The framework uses LightGBM and XGBoost to train and adjust the parameters to get the model.After feature selection of high-dimensional data in three ways,CatBoost and the algorithm with balanced samples and classification are used to train the model.Finally,the models are fused in the way of logistic regression to improve the prediction ability of minority classes.The experimental results show that the idea of balanced processing after screening high-dimensional imbalanced data and the idea of model fusion are suitable for Internet financial credit risk overdue prediction,and can improve the recognition ability of minority class.
Keywords/Search Tags:Internet finance, Imbalanced data, LightGBM, High-dimensional data, Model fusion
PDF Full Text Request
Related items