| Bonds are one of the most effective means of direct financing for enterprises,and the continuous development of this market has enriched and improved China’s financial system at multiple levels.However,at the same time,the problem of defaults in the bond market in which companies are unable to pay principal and interest on time due to financial difficulties has been common in recent years,which is a huge hidden danger for companies to prevent financial crises,for investors and creditors to protect their capital rights,or for government departments to effectively supervise the normal operation of the capital market.Therefore,it is of great practical significance to accurately and effectively predict a company’s financial distress.In addition,the reform of China’s market economy system continues to deepen,the capital market develops rapidly,and the demand for corporate financial distress prediction research from all parties in society has become increasingly urgent,and it is urgent to step up the combination of artificial intelligence and traditional business,and establish an effective financial status early warning mechanism through a more perfect forecasting method.This thesis considers the unique factors of China’s national conditions,constructs a prediction index system in line with China’s national conditions,and focuses on the problem of unbalanced sample processing under financial difficulties based on the current cutting-edge data mining technology and machine learning method theory,and proposes an early warning model of corporate financial distress with high prediction accuracy,quantifiability and certain explainability.The main research contents are as follows:First,based on the background of China’s special national conditions,this thesis uses domestic bond market data from 2014 to 2021,takes the results of existing domestic and foreign companies’ financial distress prediction research results as a reference,starts from the four major determinants of accounting information,market information,macro information and other information,and considers the compatibility between data,models and variable indicators,summarizes the specific index variables that measure the company’s ability in all aspects as a reference for establishing a system,and conducts ANOVA analysis on these variables.Correlation analysis and VIF test and other screening to eliminate conflicting variables or incompatible with data and models,establish a relatively comprehensive,effective and reasonable index system,and detect outliers in the data through the robROSE algorithm to construct the company’s financial data set.Second,when dealing with the imbalanced sample problem,this thesis deals with balanced data sets without regional information features processed by traditional SMOTE algorithm and GAN algorithm,and balances data sets with regional information features processed by Borderline-SMOTE algorithm and BMW-SMOTE algorithm,respectively.Among them,the group containing regional information characteristics performed significantly better than the other group.Third,this thesis takes the unprocessed dataset,the balanced dataset processed by the traditional SMOTE algorithm,GAN algorithm,Borderline-SMOTE algorithm and BMWSMOTE algorithm as the data basis for establishing the classification model,and constructs a single learner classification model-logistic regression model(LR),artificial neural network model(ANN)and support vector machine(SVM),and an ensemble learning algorithm model of multiple learners-random forest model(RF),Conventional gradient boosting algorithm model(XGBoost),gradient boosting decision tree model(GBDT),and adaptive boosting algorithm model(AdaBoost).The classification performance of the model was evaluated by selecting F-measure,G-mean and AUC values,and the results showed that the regional information feature had a gain effect on the imbalanced sample problem,and the establishment of the financial distress early warning model with the ensemble learning algorithm brought performance advantages. |