Font Size: a A A

Rejection Inference Based On Semi-supervised Logistic Regression With A Skewed Entropy Regularization Term

Posted on:2021-06-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y HanFull Text:PDF
GTID:2510306302976169Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
In recent years,China's economic growth has been accelerating,and the loan business,especially the personal loan business,has also developed rapidly.At the same time,with the maturity of big data storage technology,personal information is easier to store and obtain,and big data risk control models are becoming more and more effective.However,due to the relatively late development of China's credit system,there is no unified institution to manage personal credit,and there is no uniform standard for everyone.As a result,almost every loan company has an independent user credit scoring system.For loan companies,how to effectively control risks by predicting the repayment probability of borrowers in the personal loan business,and at the same time identifying more high-quality customers has become a crucial issue in the loan business.Currently,most risk control models are based on acceptance samples,because only acceptance samples have a label for whether to pay later.However,in the actual loan process,the number of rejected samples is much larger than the number of accepted samples.If we only use the accepted samples to build a model,on the one hand,a lot of information about rejected samples is wasted.On the other hand,the purpose of our risk control model is to In order to perform credit evaluation for all users,the model established by using only the accepted samples will cause the learned parameters to be biased due to sample deviations,which reduces the prediction accuracy of the credit scoring model.In response to this problem,scholars have proposed the concept of ”refusal inference”,hoping to reduce the impact of sample selection bias on the model during the modeling process.With the concept of rejection inference,many methods to implement rejection inference have been proposed.People are constantly verifying the necessity of rejection inference,exploring the effectiveness of each rejection inference method,and summarizing the associations and conditions of use of methods.In the early days of the development of refusal inference,most researches used statistical knowledge to implement refusal inference,but in recent years,with the development of machine learning and deep learning,some methods based on machine learning to implement refusal inference have emerged.Based on the actual background of borrowing and lending,the semi-supervised learning in machine learning is used to combine the most commonly used logistic regression algorithm in risk control modeling with the information entropy theory in information theory.”Algorithm,the basic theory of the algorithm is entropy semi-supervised learning,but the original algorithm has been adjusted in combination with the actual lending scene.After theoretical research of the algorithm,we use real loan data to verify the effectiveness of the algorithm.The results prove that: The logistic regression model has a good effect in credit evaluation,but there is a large difference between the prediction results in the rejected samples and the accepted samples,that is,the modeling parameters are biased;The logistic regression algorithm based on the inclined entropy regularization term helps to improve the accuracy of the risk control model.The model score has been improved on both accepting and rejecting users.This conclusion has been verified in two different data sets.The model effectively improves the problem of biased modeling due to sample selection bias;For big data risk control modeling,feature selection and feature engineering are of great significance to the effect of the model.In the modeling process,it is necessary to fully consider the impact of feature engineering on the model effect.
Keywords/Search Tags:Reject inference, credit score, semi-supervised learning, logistic regression
PDF Full Text Request
Related items