Font Size: a A A

Research And Design Of A Distributed Framework For Federated Learning With Gaussian Noise Differential Privacy

Posted on:2024-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y XuFull Text:PDF
GTID:2558307079972259Subject:Electronic information
Abstract/Summary:PDF Full Text Request
The popularity of the Internet and smart devices has provided massive data for machine learning and deep learning.However,whether it is traditional machine learning or deep learning,high-quality data is inseparable from high-precision models.These training data often contain private information.In the centralized learning scenario,the organization collects,stores and uses personal data without protection.In order to protect personal privacy,a privacy protection mechanism should be adopted.Based on this,Google proposes federated learning.Although federated learning avoids the direct leakage of data,the communication parameters between the client and the server still have the risk of exposing the original data.On the other hand,the assumption of independent and identically distributed data sets is not suitable for cross-device or client-side federated learning scenarios,and existing federated learning algorithms are still based on independent and identically distributed data sets.How to improve the efficiency and performance as much as possible while ensuring the privacy and security of personal data is a major problem currently facing.Thesis focuses on the topic of privacy protection of federated learning under NonIID,and conducts a series of research on the basis of the existing federated learning framework.The main research work of thesis is as follows:(1)Verify the main factors affecting the performance of the federated learning model in the Non-IID scenario through simulation experiments,and compare the existing federated learning algorithms in different data from the three perspectives of unbalanced,label distribution skew and feature distribution skew.Performance in the distributed case and versus traditional centralized learning.And through mathematics,it is proved that the reason why Non-IID leads to the decline of federated learning quality is due to the data distribution of samples.(2)A new federated learning framework PSD-FL(Privacy Synthesis Data-FL)is proposed to improve the data imbalance problem in the Non-IID scenario.Extending the traditional federated learning to 3 steps,the pre-training stage is used for generative model training,and a Gaussian noise-based differential privacy GAN model is proposed,which combines Lipsitz conditions with differential privacy sensitivity to generate high-quality data with privacy protection;the server maintains a shared synthetic data set in the preparation stage;a pseudo-label update mechanism and a server update mechanism are proposed in federated training.The final experiment proves that under a reasonable privacy budget,this framework can perform well on Non-IID datasets in both supervised and semi-supervised scenarios.(3)Based on the above two studies,thesis builds a federated learning platform based on Gaussian noise differential privacy,and designs and implements the server,client and communication.The platform not only provides a convenient federated training process for non-federated learning workers,but also provides high-quality federated training effects and user privacy protection.Finally,after several tests,thesis confirmed the reliability and usefulness of the platform,making it a solid solution.
Keywords/Search Tags:Federated Learning, Differential Privacy, Non-IID, GANs
PDF Full Text Request
Related items