| In recent years,the volume of China’s bond market has expanded rapidly,with a variety of innovative bonds emerging and the bond market ranking jumping to the second in the world.However,due to various influences such as economic cycles,new crown epidemics,and international situations,the number of bond defaults has increased rapidly and the scope of defaulted industries covers almost all industries.To address the current problems of lack of credit data and insufficient model accuracy and interpretation in the study of corporate bond default risk prediction.It is extremely important to establish an effective corporate bond risk prediction model to more accurately predict the default risk of corporate bonds by mining the important factors affecting the default risk of corporate bonds from a large amount of non-credit data through machine learning algorithms,so as to prevent and cope with the potential corporate bond default risk,maintain the stable and healthy development of China’s bond market,and better serve the development of the real economy.First,the corporate bond characteristics were analyzed with a sample of corporate bonds in China’s bond trading market from 2015-2022,and the data of corporate bonds were found to be characterized by large characteristic dimensions,complex correlations and imbalances.The indicators related to macro factors,debt characteristics,and financial and non-financial factors of enterprises that affect the default risk of corporate bonds are initially identified in further in-depth reading and analysis of the literature.The indicators with strong predictive power for the variables are screened out by calculating IV values to reduce the dimensionality of data characteristics.Then,to improve the prediction accuracy of the corporate bond default risk prediction model,the GA-LightGBM model was constructed by optimizing the LightGBM model using genetic algorithm.The Borderline-Smote method is then introduced into the prediction model to perform nearest-neighbor linear interpolation for the boundary samples and synthesize new minority class samples to reduce the impact of data imbalance on the prediction accuracy.In order to compare the impact of different indicators on the accuracy of the prediction model,all indicators are divided into three groups in this paper,namely,the financial indicator group with only corporate financial indicators,the financial plus non-financial indicator group with bond characteristics,corporate non-financial indicators and financial indicators,and the optimization indicator group with all indicators.Secondly,the stability of the prediction model was verified by calculating the number of feature splits in all decision trees to obtain the feature importance distribution and dig out the important factors affecting the default risk of corporate bonds,and then by the ten-fold cross-validation method.Finally,to demonstrate the applicability and excellence of the constructed model in corporate default risk prediction,a comparison is made with logit model,SVM model,BP neural network,XGBoost model,and LightGBM model based on the optimized indicator set.The research finds that:(1)Based on the genetic algorithm and LightGBM algorithm to build the GA-LightGBM corporate bond default risk model,the prediction accuracy of the optimized indicator group is 2.3% and 0.47% higher than that of the financial indicator group and the financial plus non-financial indicator group,respectively,indicating that the inclusion of non-financial indicators and macro environment-related indicators can improve the accuracy of the model prediction to a certain extent.(2)For the risk of corporate bond default,the factors that have a greater impact on the risk of corporate bond default include 19 indicators such as the shareholding ratio of major shareholders,the quick ratio and the year-on-year growth rate of total assets,among which the top three in importance are the shareholding ratio of major shareholders,the total issue amount and the quick ratio,and the bottom three in importance are the year-on-year growth rate of CPI,the shareholder’s corporate attributes and the year-on-year growth rate of GDP.The bottom three indicators are CPI growth rate,shareholder’s attributes,and GDP growth rate.(3)From comparing different models,it is found that the prediction accuracy of GA-LightGBM model is better than logit model,SVM model,BP neural network,XGBoost model,and LightGBM model,and the model performance is from best to worst: GA-LightGBM model,LightGBM model,XGBoost model,BP neural network,Logistic model,and SVM model.The contribution of this paper mainly lies in the fact that the existing literature on corporate bond default risk forecasting takes more consideration of the selection of financial and non-financial indicators.This paper,on the other hand,adds macroeconomic-related indicators based on the analysis of corporate bond default risk factors.The GA-LightGBM model,which is constructed by using genetic algorithm to optimize the LightGBM model,is used to forecast the corporate bond default risk. |