| Although federated learning can realize joint sharing modeling under the premise of ensuring data privacy and security,and solve the data integration dilemma faced by the banking industry,there are still some problems and challenges in the task of credit risk prediction.First,the data is highly skewed,and it is difficult for the model to learn the feature information of default samples,thus reducing the training effect of the federated learning model.Secondly,compared with centralized learning methods,traditional federated learning actually has many performance defects,such as client drift,low model accuracy,slow convergence,and system inefficiency.It is more serious when the data of each participant is not independent and identically distributed.Based on the framework of federated average algorithm,this thesis studies and improves from two aspects of balanced sampling and algorithm optimization.The main research contents are as follows:(1)The mechanism of traditional credit risk prediction models is studied,and federated learning is applied to financial credit default prediction.The model effect of traditional machine learning is evaluated on the credit public data set,and the model with the best performance is selected as the underlying model of the federated learning framework.A series of comparative experiments demonstrate the feasibility and effectiveness of federated learning to solve the problem of credit risk prediction.(2)In order to reduce the impact of the data imbalance problem on the federated learning model in the credit default scenario,a DBADATomek hybrid sampling algorithm is proposed.Firstly,Density-Based Spatial Clustering of Applications with Noise(DBSCAN)is performed on the minority class samples,and the Adaptive Synthetic Sampling(ADASYN)algorithm is used for oversampling according to the sampling weight of the sub-clusters,which reduces the intra-class imbalance.Finally,Tomek Links cleaning technology is used to correct the noisy data and overlapping samples generated by oversampling,which improves the quality of synthetic samples and the classification performance of the model.By conducting balanced sampling experiments on datasets with large differences in positive and negative samples,it is proved that the algorithm is more robust than other sampling algorithms.(3)In order to solve the problem of poor performance of the federated learning algorithm in the credit risk prediction task under the non-IID data,a federated optimization algorithm for individual and group asynchronous sharing(IGFL)is proposed.In the client-side optimization and server-side optimization,the behavior of individuals and groups is fully utilized to simulate the corresponding distribution,and the global attention mechanism is incorporated into the server-side optimization.Under the premise of ensuring user privacy,it overcomes the limitations of traditional federated learning in solving non-IID problems,and alleviates the overhead of synchronization problems to a certain extent.Experiments show that the performance of the IGFL algorithm is significantly better than that of the federated average algorithm,especially in the non-IID scenario,which can better reflect the robustness of the IGFL algorithm. |