| In recent years,with the development of mobile internet and computer technology,numerous interconnected devices such as smartphones and wearable medical devices have permeated people’s daily lives,generating or collecting massive amounts of data every day.In this context,distributed learning has emerged as a means to effectively utilize scattered data resources.In distributed learning,each participant uses local data to perform local computations,and at the same time interacts with other participants to achieve cooperation.Compared with centralized learning,distributed learning can make full use of the local resources of each participant,achieve the global processing with a small communication cost,and has the characteristics of high flexibility and robustness.However,since the raw data often contains a lot of sensitive information,the information exchange between various participants has the risk of privacy leakage.Recently,researchers have proposed a distributed learning strategy called federated learning that places a greater emphasis on data privacy.It enables multiple participants to jointly build a common learning model through sharing model parameters or gradient updates,thereby avoiding direct disclosure of the raw data.But studies have shown that attackers can still infer the sensitive information of data owners from the transmitted model parameters or gradients.To solve the above problem,this thesis primarily investigates how to enable multiple participants with sensitive data to collaboratively build a learning model while ensuring data privacy.In the thesis,we consider the horizontally distributed data and the vertically distributed data respectively,and conduct a series of researches around the privacy protection issues faced by distributed learning,combined with data heterogeneity and security defense.The main works and contributions are summarized as follows.Firstly,for the distributed estimation problem over multitask networks,we propose a privacypreserving distributed multitask learning algorithm.In the proposed algorithm,we design an adaptive cooperation strategy based on the Euclidean distance between the estimates of neighbors,which enables each participant to adaptively adjust the combination weights according to the similarity between tasks.Furthermore,we design a privacy-preserving computation protocol based on the combination of random masks and additive homomorphic encryption to implement this nonlinear cooperation strategy,so that participants can cooperate to improve the estimation performance while protecting their privacy.The privacy guarantee,complexity and convergence of the proposed algorithm are analyzed theoretically in detail,and its effectiveness is verified by simulation experiments.Secondly,for the problem that the horizontal federated learning system is vulnerable to label-flipping attacks,the existing defense methods have serious performance degradation when the data is heterogeneous and cannot protect the data privacy of participants.To solve this problem,we propose a label-flipping-robust and privacy-preserving federated learning algorithm.We first analyze the impact of label-flipping attacks on the model gradients and propose a detection method based on the temporal analysis of cosine similarity,which can accurately identify malicious attackers even when the data distribution is heterogeneous.Furthermore,to prevent the server from inferring private information from the transmitted model gradients,we design a privacy-preserving computation protocol based on homomorphic encryption to implement the detection method and perform model aggregation.Besides,we give a detailed theoretical analysis to demonstrate the privacy guarantee of the proposed protocol,and conduct extensive experiments on real-world datasets to show the effectiveness of our algorithm against label-flipping attacks under various data distributions.For the utility-privacy trade-off problem existing in the horizontal federated learning based on differential privacy,we propose a differential-privacy-based robust federated learning algorithm.In the proposed algorithm,we design a classifier-perturbation regularization method to improve the robustness of the classifier against noise,thus reducing the impact of differentialprivacy-injected noise on the model performance.We present theoretical privacy and convergence analysis of the proposed algorithm,give a tight estimation on the total privacy loss and analyze the influence of some hyperparameters on the convergence performance.The experimental results on real datasets show that the proposed algorithm has better classification performance than the existing differential-privacy-based federated learning algorithms at the same level of privacy protection.Finally,we study the classification problem on the vertically partitioned data.Most of the existing vertical federated learning algorithms with strict privacy guarantee are designed based on linear models,which have the problem of insufficient generalization performance.To address this problem,we propose a privacy-preserving vertical federated broad learning algorithm,which can achieve generalization performance close to deep learning models while strictly protecting the data privacy of each participant.Specifically,we design a computation protocol based on double-trapdoor additive homomorphic encryption to achieve privacy-preserving nonlinear feature mapping,model training,and prediction.In addition,we analyze the privacy guarantee of the proposed algorithm in detail,and prove that except the active participant with training labels can obtain the prediction results,none of the other participants and the server can obtain any useful information.Experimental results on multiple real datasets show that the proposed algorithm has good generalization performance. |