Font Size: a A A

Decentralized Vertical Federated Learning Based On Random Forest

Posted on:2024-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y LiFull Text:PDF
GTID:2568307064997089Subject:Engineering
Abstract/Summary:PDF Full Text Request
Federated Learning(FL)is a specialized distributed machine learning framework proposed by Google in recent years.The framework enables global model training based on distributed datasets while ensuring the privacy and security of participant data.This is accomplished through the exchange of encrypted model parameters instead of raw private data between participants.At present,most of the work related to federated learning revolves around the scenario where the data is distributed horizontally according to the samples,but in practical applications,the scenario where the data is distributed vertically according to the features is also common.At the same time,most of the current work related to federated learning needs to rely on a trusted central server to organize,guide and aggregate the training process.However,in practical applications,such a reliable,stable,and universally trusted central server is often difficult to obtain or establish.To mitigate the problems in the existing federated learning related work,this paper proposes a decentralized vertical federated learning based on the random forest algorithm,which is suitable for scenarios where data is distributed vertically according to characteristics and does not rely on a central server.Additionally,this paper presents an optimization method for the proposed model in imbalanced data scenarios.Our main contributions are as follows:1.This paper designs and implements a decentralized vertical federated learning model based on random forest.The model adopts a decentralized peer-to-peer network architecture for inter-client communication to achieve the training of the vertical federated learning model.For each round of training for each classifier in the federated learning model,the active party responsible for segmenting the sample set,the passive party holding the candidate features,and the label holder holding the label data complete the update of the round of training through direct communication by transmitting encrypted intermediate variables.The global model data is composed of local model data held by each client during the training process,and prediction is completed by calling local model data in each client during the prediction process.Additionally,the decentralized architecture enables multiple clients to train multiple independent classifiers concurrently,resulting in better allocation of computing resources during the federated learning training process.In comparative experiments,the proposed model outperformed existing approaches on most datasets,achieving F1-score improvements of 0.6% to 2.8%,and also demonstrated superior training efficiency,achieving speed-ups of 1.3x to 7.1x on four public datasets,compared to other existing approaches.2.This paper proposes an optimization method based on k-means clustering for the decentralized vertical federated learning model presented in the previous research,which addresses the problem of insufficient model accuracy in imbalanced data scenarios.In the optimization process,each client performs k-means clustering on its local data based on its feature space and sends the resulting cluster centroids’ sample ID sets to the label holder.After receiving the cluster centroids’ sample ID sets from all clients,the label holder mixes them with the minority class sample ID set to obtain a new balanced sample ID set and sends it to all clients for classifier training.The experimental results demonstrate that the optimization method proposed in this paper has improved the F1-score by 12.6% and 4.5%,as well as the G-mean value by 8.5%and 3.8% on two imbalanced public datasets for the decentralized vertical federated learning model in this study.This paper presents a decentralized vertical federated learning model based on random forest that avoids the problems of single-point centralized load and centralized storage of global model data in traditional federated learning that arise due to the presence of a central server,as well as the difficulty of obtaining or establishing an ideal central server.Additionally,in this decentralized federated learning model,each client can train multiple independent classifiers concurrently,thereby enhancing training efficiency.In the optimization of this federated learning model for imbalanced data scenarios,this paper uses the k-means clustering algorithm to improve the model’s ability to handle imbalanced data while ensuring the privacy and security of the data.
Keywords/Search Tags:federated learning, random forest, homomorphic encryption, differential privacy, k-means clustering
PDF Full Text Request
Related items