| With the development of machine learning technology,various banks have set up big data centers to use big data as a strategic resource to support business development,and widely apply machine learning technology to credit card fraud detection and other fields.However,the data source of the credit card fraud detection model usually only covers the information within the bank,which limits the accuracy of the model.And the data of fraudsters in non-banking industries is more likely to indicate the occurrence of fraudulent behavior.However,laws and regulations related to privacy protection limit the data sources available to banks,making it impossible for banks to improve the performance of credit card fraud detection models through joint modeling with companies in other industries.The emergence of vertical federated learning technology brings solutions to the above problems.However,when using vertical federated learning technology in the field of credit card fraud detection,there are two major challenges.Firstly,it is difficult to solve the class imbalance problem in datasets,which is widespread in this field.Secondly,in vertical federated learning,features are distributed among different participants,and operations such as cross-participant feature combination are restricted due to privacy protection requirements,limiting the upper limit of model learning on the dataset.To solve the above problems,this thesis transforms the vertical federated learning paradigm based on the SMOTE algorithm,obtaining the Vertical Federated Learning Synthetic Minority Over-sampling Technique(VFL-SMOTE)for balancing the dataset using homomorphic encryption technology to protect intermediate data and synthetic sample information.To use encrypted synthetic samples for training,this thesis designed the Vertical Federated Learning Logistic Regression Algorithm(VFL-LR)that utilizes fully homomorphic encryption technology.After experimental verification,the use of VFL-SMOTE can improve the F1 score of the VFL-LR model by 0.1675,achieving the goal of improving model training effects.To solve the problem that cross-participant feature combination cannot be carried out during vertical federation,this thesis designed a fusion model(VFL-XGBoost-LR)that combines VFL-XGBoost with VFL-LR.This thesis used the automated feature selection and combination ability of extreme gradient boost to take the discrete data generated by each of its leaves as input for the VFL-LR model.Experimental results demonstrate that the fused model in the bank risk-related dataset outperforms the VFL-XGBoost model and VFL-LR model in terms of F1 scores,AUC indicators,and accuracy.Finally,this thesis designed and implemented the Vertical Federated Learning Credit Card Fraud Detection System,which incorporates VFL-SMOTE as the core algorithm for data balance module,integrates a series of federated learning algorithms including VFLXGBoost-LR into the algorithm module,and developed a multi-party scheduling platform and a visualization platform for training vertical federated learning models.This thesis also designed a credit card fraud detection interface for banks to obtain model inference results. |