| With the arrival of the digital economy and the acceleration of global information integration,federated learning as an effective distributed data mining technology has become increasingly popular in big data services.However,big data generated by the complex systems,such as cloud computing,edge computing,Internet of Things,and industrial control systems,have characteristics of decentralization,openness,mobility,multi-security,and heterogeneity,which make federated learning a major challenge in terms of data security and quality of service.In this thesis,we will focus on several key issues in federated learning,including privacy preservation,heavy overhead of secure computation,unverifiable calculation,malicious attacks,model integrity and model availability.The main contributions can be summarized as follows:(1)Lightweight Privacy-Preserving Federated Learning.In distributed data mining,federated learning is required to guarantee data privacy.whereas model parameters also contain sensitive information.Homomorphic encryption is a promising solution to avoid privacy leakage risks while protecting data confidentiality.The existing privacypreserving machine learning mechanisms based on homomorphic encryption are proposed.which involves amounts of secure computation over encrypted data,resulting in a heavy computation burden.Therefore,we design a lightweight privacy-preserving federated learning mechanism,which redesigns the extreme gradient boosting(XGBoost)model.In addition,this scheme adopts encrypted model parameters instead of local data to remove amounts of ciphertext computation to plaintext computation,thus realizing lightweight privacy preservation.Security analysis and experimental evaluation indicate the security,effectiveness,and efficiency.(2)Unbiased Federated Learning under Heterogeneous Settings.Federated learning suffers from low convergence and significant accuracy loss due to local biases caused by Non-Independent and Identically Distributed(Non-IID)data.To enhance the nonIID federated learning performance,a straightforward idea is to leverage the Generative Adversarial Network(GAN)to mitigate local biases using synthesized samples.We propose a GAN-based unbiased federated learning scheme to mitigate local biases using synthesized samples generated by GAN while preserving user-level privacy in the FL setting.To guarantee user-level privacy,we then exploit fully homomorphic encryption to design the privacy-preserving GAN augmentation method for the unbiased federated learning.Extensive experiments show that this scheme achieves unbiased federated learning with significant accuracy improvement compared with two state-of-the-art federated learning baselines trained under different Non-IID settings.(3)Secure and Verifiable Federated Learning.Federated learning typically involves collaborative training among users,but there is no trusted relationship among them.In the untrusted environment,there is the risk of a covert adversary corrupting a number of data domains,which can execute dishonest secure computation,resulting in inaccurate training or privacy leakage.To prevent dishonest computations and inconsistent inputs,we propose a secure and verifiable federated learning scheme that supports active security against a covert adversary and verifies the correctness of intermediate parameters and final training results.Our formal security analysis shows that this scheme can achieve privacy,completeness and soundness.Empirical experiments using real-world datasets also demonstrate that the scheme has high computational efficiency and accuracy.(4)Mitigating Model Poisoning Attacks in Federated Learning.Privacy-preserving federated learning is vulnerable to model poisoning attacks launched by a Byzantine adversary,who crafts malicious local gradients to harm the accuracy of the federated model.The Byzantine adversary submits encrypted poisonous gradients to circumvent existing defense strategies,resulting in encrypted model poisoning attacks.To address the issue,we design a privacy-preserving defense strategy using two-trapdoor homomorphic encryption,which can resist encrypted model poisoning without compromising privacy.Specially,we first present the secure cosine similarity method aiming to measure the distance between two encrypted gradients.Extensive-evaluations on three benchmark datasets show that our scheme achieves robustness for both Independently Identically Distribution(IID)and Non-IID data. |