| In China,banks still have limitations in customer group access and risk identification.For a long time,they have not been able to establish good communication with small and micro groups,and there is still a certain distance from the popularization of loan supply.In this case,the rapid development of the loan assistance institutions to help banks to carry out the introduction of customers,risk control,financial assistance,credit assessment and other work.Based on the function of the loan assistance institutions,if the loan assistance institutions can not accurately review and evaluate the repayment willingness and repayment ability of customers,the banks will bear the risk of loss,and the business of the loan assistance institution will be difficult to sustain.Therefore,an efficient and accurate borrower default risk prediction model is urgently needed to effectively identify borrowers who may default in the future,improve the risk prediction ability of loan assistance institutions.This paper mainly uses the data of the borrowers of Youqianhua loan assistance institution to establish the default risk prediction model of the borrowers of the financial institutions.The obtained data is processed: firstly,the missing and useless features are cleaned up,and then the date data,sub type data and numerical data are converted into different forms.Due to the imbalance of the data set,the adaptive oversampling method is used to balance the data and then the data diversity is achieved.In the selection of base model,traditional data mining algorithm and new data mining algorithm: support vector machine,random forest,BP neural network and XGBoost algorithm are selected for comparison.The four algorithms are trained and adjusted on the training set,and the accuracy,recall,F1 value and AUC value are used to evaluate the prediction effect of the model.The results show that the performance of support vector machine model is poor,and the four indexes are not as good as the other three algorithms,random forest model performs well in recall;BP neural network performs well in accuracy,XGBoost algorithm performs very well in recall,F1 value and AUC value,so in model integration,BP neural network and XGBoost are selected as the base model for integration,and the new model is named BP-XG integration model.BP-XG integrated model is better than single model in recall rate,accuracy rate,F1 value and AUC value,and shows strong classification performance and generalization ability in default risk prediction.To sum up,BP-XG integrated model is suitable for the establishment of default risk prediction model of loan assistance institutions,which can more accurately screen out borrowers who may default in the future and reduce the loss of loan assistance institutions.Therefore,it provides a reference for incorporating the BP-XG integrated model into the risk control system of loan assistance institutions. |