Font Size: a A A

Research On Key Issues Of Privacy Preservation In Federated Learning

Posted on:2024-02-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:L F ZhangFull Text:PDF
GTID:1528307118954779Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
In recent years,thanks to the development of computer technology,storage technology and network technology,the amount of data on various platforms has grown exponentially.By learning multidimensional features hidden in this data that are difficult for humans to distinguish,machine learning approaches can yield better results.However,this data is often scattered among various institutions or platforms.In fact,the lack of sample data leads to poor performance of Machine learning(ML)-based services,which is the bottleneck preventing Artificial intelligence(AI)from realizing its full potential in various industries.Obviously,big data integration is an effective way to solve the contradiction between the huge realistic demand and the shortage of resources.However,this presents two big challenges.The first challenge is the high cost of integrating this data directly.It will put enormous strain on infrastructure such as networks,storage and computing.Therefore,it is impractical to consolidate this data into a single data center.The second challenge is the security issue posed by the high sensitivity of industry data.Financial,medical and other records contain a lot of sensitive personal information,making many organizations reluctant to share data with the outside world.In addition,a growing number of privacy regulations have been introduced to prevent the re-identification and misuse of private data.To promote the wide application of AI technology in various fields and prevent users’ privacy from being violated while conducting scientific research on big data,technical solutions must be implemented.Federated Learning(FL)generates global models by aggregating local model parameter updates from various participants without centralizing data on a single or several servers.In this way,private data remains on each participant’s local device,and only model parameters or gradients are shared.Although FL appears to be secure,it does not by itself provide the level of privacy and security required for today’s distributed systems,which in fact leads to FL facing a number of trusted challenges such as data security and privacy violations.Therefore,privacy leakage and security attack in federated learning,as a potential threat,is an urgent problem to be solved.To solve the above problems,this thesis focuses on FL’s privacy leakage and security attacks of untrusted clients or adversaries.Based on Homomorphic Encryption(HE),Differential Privacy(DP)and Secure Multiparty Computation(SMC),investigate how to build trusted federated learning methods for privacy preservation and robust aggregation to facilitate FL applications in many fields.The main work and contribution of this thesis are as follows:(1)A federated learning privacy preservation scheme based on homomorphic encryption and differential privacy is proposed.Because some existing research schemes fail to consider the privacy threats that may be faced in machine learning,or the privacy preservation measures are single,which cannot cover the whole life cycle of machine learning,a federated learning privacy preservation scheme based on homomorphic encryption and differential privacy is proposed for the security aggregation of medical image classification.Since existing schemes are generally oriented to data of a single mode and most of their task types are single,Med MNIST data set with multi-data set design is selected as the benchmark to test the universality of the model in the medical field conveniently.To prevent adversaries from stealing original data through model reverse attack or member inference attack,differential privacy perturbation is performed on the model parameters uploaded by each participant.To prevent semi-honest but curious servers and adversaries from obtaining the local model information of each party during model training,the Paillier cryptosystem was used to encrypt the local model parameters.In theory,the security of the model is analyzed,the definition of the security model is given,and the security of the subprotocol is proved.A large number of experimental results show that the proposed scheme can protect the privacy of training data and model without any loss of performance.(2)A federated learning privacy preservation scheme based on secret sharing and differential privacy is proposed.In the former scheme,there may be collusion between the server and the participating node,which leads to the risk of privacy disclosure of model parameters and training data of the honest node.Similar schemes are also ineffective against collusive attacks.In addition,the scheme based on homomorphic encryption has high computing and storage costs,which limits its practicability to a certain extent.In addition,most of the existing programs focus on the privacy preservation of 2D data,while few federated learning privacy preservation methods for 3D data are studied.Therefore,a federated learning scheme for privacy preservation based on secret sharing and differential privacy is proposed for 2D and 3D data.To prevent the adversary from stealing the original data through inference attacks,differential privacy perturbation is performed on the model parameters uploaded by each participant.To prevent the adversary from reconstructing the local model parameters of any specific node in the process of model training,the additive secret sharing algorithm is used to perform additive randomization of the local model parameters uploaded by all parties.The global information privacy and anti-collusion aggression of the scheme are theoretically proved.The experimental results show that the performance indexes of this scheme are close to other existing solutions,but it provides higher data security and anti-collusion aggression.(3)A robust aggregation algorithm for privacy preservation is proposed to defend against Byzantine attacks in FL.In a federated learning scenario,protecting FL from Byzantine attacks while considering performance,efficiency,privacy,number of attackers,simplicity,and so on is a challenging problem.The existing privacy preservation schemes generally assume that the server is honest but curious,and only consider the privacy preservation during the model training process,and usually cannot defend against Byzantine attacks by malicious clients.Some of the defenses against Byzantine attacks in FL require the number of malicious clients to be smaller than the number of benign clients;Some rely on a clean secondary data set to train detectors or assist in detecting malicious models,which violates privacy principles;Some schemes have some limitations.For example,the verification protocol is very complex and requires the cooperation of the client,which reduces their practicality.Therefore,a robust aggregation algorithm for privacy preservation is proposed by combining the 2-norm distance and double normalization.The proposed algorithm does not require any assumptions outside the training process and can deal with a small number and a large number of attackers adaptively.By calculating the 2-norm distance between the local models,different credit scores are set for each local model.Any underperforming local model updates will be assigned a lower weight and then the individual models will be adaptive aggregated based on credit scores.To prevent the adversary from stealing the original data through model reverse attack or member inference attack,differential privacy perturbation is performed on gradient parameters for each participant during local training.The robustness of the proposed scheme against Byzantine attacks is proved theoretically.A large number of experiments have also confirmed that the algorithm has better robustness to malicious clients in the face of Gaussian attacks,and is significantly superior to other mainstream security aggregation methods in terms of accuracy and other evaluation indexes.
Keywords/Search Tags:Big data, Federated learning, Privacy threats, Privacy preservation, Robust aggregation
PDF Full Text Request
Related items