The Study On Bank Credit Risk Assessment Method Based On Data Mining

Posted on:2023-12-20

Degree:Master

Type:Thesis

Country:China

Candidate:K Y Jin

Full Text:PDF

GTID:2568306815991669

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Efficient and scientific credit risk assessment system is an effective way to solve the credit default risk caused by information asymmetry between banks and users.Given the problem that the current bank user data have unbalanced categories,the characteristic attributes of the original data set are limited.The general credit risk assessment model is to conduct a credit risk assessment on the original data set,but the deep information in the data cannot be effectively used.The generalization ability of the model and the reliability of the prediction results often cannot meet the needs of commercial banks.Therefore,this thesis proposed an effective bank user credit risk assessment system based on data mining theory.The system used the feature construction method to expand the data feature dimension of the original data set,thoroughly excavated the deep data resources,and balanced the data set by comprehensive sampling,which improved the generalization ability of the system and the reliability of the prediction results.Firstly,the original data set is preprocessed by missing value processing,abnormal value processing,and data conversion.Given the class imbalance in the data set,the SMOTE(Synthetic Minority Oversampling Technique)and TOMEK are integrated to balance the sample data.The possibility of the under-sampling algorithm to remove the samples with rich information content was avoided,and the misjudgment of the critical point samples caused by the overlapping of synthetic samples in the SMOTE algorithm sampling process was also avoided.Therefore,the recognition ability of the model was improved in the case of the categories with fewer samples.Secondly,in order to solve the problem of limited feature attributes in the original data set,this thesis uses expert knowledge in the field of financial risk control and data mining technology such as feature box and feature cross to derive features and filter features through IV(Information Value)and retains 25 feature fields conducive to model classification.Finally,balanced,feature engineering,and conventionally preprocessed data sets were used as the inputs of the model.The classification algorithms of single and integrated models were compared and analyzed on the user credit data set published by Lending Club: logistic regression,naive Bayes,decision tree,random forest,XGBoost,and Stacking algorithm.The experimental results demonstrated that the AUC values were improved for models constructed by the balanced data set.Besides,the best performance was obtained by the ST-Stacking model,and the AUC value and accuracy rate were improved to 91.77 % and 88.63 %,respectively.

Keywords/Search Tags:

Data mining, Credit risk, Imbalanced data, Feature construction, Stacking algorithm

PDF Full Text Request

Related items

1	Application And Research Of Multi-classifier Fusion Algorithm In Credit Risk Control
2	Optimization Of P2P Personal Credit Risk Assessment Model Based On Data Mining Technology
3	Research On Individual Credit Risk Assessment For Imbalanced Data
4	Research On Credit Evaluation Method Based On Mixed Sampling And Stacking Integration
5	Research On Credit Evaluation Based On Improved Oversampling Method And Adaptive Ensemble Model
6	Research On Credit Default Risk Control Under Imbalanced Data
7	Bank Credit Scoring System Based On Data Mining Research
8	Research On Credit Card Fraud Detection Based On Machine Learning Algorithm
9	Implementation Of Credit Risk Management System Based On Data Mining
10	Research On Credit Risk Of Commercial Banks Based On Data Mining Technology