Font Size: a A A

Research On LightGBM-based Online Loan Risk Prediction On Spark Platform

Posted on:2024-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:H Y WanFull Text:PDF
GTID:2568307136495424Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Online loan can not only meet the diversified lending needs of residents and enterprises,but also promote the development of social economy and the inclusiveness of finance.However,the large number of default risks in online loan has brought huge losses and troubles to lending platforms and borrowers.With the massive increase in user loan information,how to effectively and quickly predict user default risk and evaluate user credit rating is an important and urgent problem in the field of online loan.To address the above issues,this thesis utilizes big data technology and machine learning technology to construct a LightGBM-based online loan risk prediction system on the Spark platform,providing an efficient,accurate,and stable tool for user risk assessment and credit scoring for online loan platforms.The main contributions and innovations of this thesis are:(1)For the label imbalance characteristics of online loan data,an improved oversampling algorithm is proposed.The concept of sample density is introduced into the Borderline SMOTE algorithm,the method of synthesizing new samples is improved,and the K-nearest neighbor algorithm is used to screen the synthesized minority class samples to further optimize the data set.An oversampling method for online loan data is designed on the Spark platform,and an approximate Knearest neighbor algorithm based on hybrid Spill trees is used to parallelize the oversampling algorithm.(2)Using sparrow search algorithm to optimize LightGBM model hyperparameters,design fitness function and discrete strategy.On the Spark platform,three parallel modes of LightGBM model are analyzed,the process of model parameter search on the cluster is designed,and a credit score conversion method is established based on the default prediction results of the model.(3)A prototype system for online loan risk prediction is constructed.The system adopts a fourlayer structure of browser,server,distributed system and distributed storage,taking into account maintainability,security and stability.It completes five major functions: information management,risk prediction,data management,model management and cluster management.It is implemented using Flask server framework.This thesis verifies the effectiveness and innovation of the proposed method through multiple experiments.The experimental results show that the improved oversampling algorithm has advantages in AUC value and KS statistic in most cases;LightGBM model optimized by sparrow search algorithm has higher accuracy than other machine learning models on Lending Club dataset with AUC value reaching 0.935 and KS statistic reaching 0.740;the Spark platform can speed up algorithms to a certain extent and reduce running time;credit score conversion method meets the requirements of online loan credit score and can distinguish users with different credit ratings.
Keywords/Search Tags:Online loan, Risk prediction, Oversampling, Sparrow search algorithm, LightGBM
PDF Full Text Request
Related items