| In recent years,the machine learning,especially ensemble learning,has been widely used in the financial field.Bagging(bagging method)and Boosting(boosting method)are two important representatives of ensemble learning.Although their application scenarios have been greatly expanded,there are large differences in prediction and inspection performance in different scenarios,so that the judgment of which method is better or worse between the two is not consistent.The identification of financial fraud is a major application scenario that the capital market pays attention to.The effect of ensemble learning on the identification of financial fraud,especially the specific performance of the two methods of Bagging and Boosting,has attracted widespread attention in the academic community.This thesis uses the relevant data of A-share listed companies on the Shenzhen Stock Exchange from 2013 to 2018 to systematically compare and analyze the actual performance of five ensemble learning models based on Bagging and Boosting strategies for identifying financial fraud.Among the many models of ensemble learning,the random forest model based on the Bagging strategy outperforms the GBC,Adaboost,Xgboost,and Light GBM models based on the Boosting strategy.It is specifically manifested in the ranking indicators such as NDCG@k,Precision@k,and Recall@k.This result can be explained from the data characteristics of the financial fraud scene and the design characteristics of the Bagging algorithm.The data scale of the financial fraud scene is small and the feature dimension is high,which is naturally suitable for the ensemble learning based on the Bagging strategy of the random forest.And the Bagging strategy gives full play to the simple integration idea of “three cobblers are worth one Zhuge Liang”,simulating the process of inductively integrating the judgments made by a large number of market participants based on different information sets.Secondly,through the model explanatory analysis based on the importance of features,this paper finds that although there are certain differences in the ranking of important features that different ensemble learning focuses on,some important features will always attract the attention of each model.Among them,the features of profitability and asset quality are more important to identify financial fraud.When there are sharp drops in income,continuous losses,etc.,it usually indicates a further increase in the probability of fraud.This thesis reveals the mechanism of ensemble learning strategy in the identification of financial fraud,and expands the related research on ensemble learning in the financial field. |