Font Size: a A A

Research On Federated Learning With Heterogeneous Data

Posted on:2023-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y LuoFull Text:PDF
GTID:2568307076485294Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Federated learning is a distributed machine learning paradigm that uses the data of clients to train models by means of distributed training and model aggregation.However,the data of each client is usually generated independently,and data features are distributed differently among clients,the data is heterogeneous.Federated learning does not require collecting the data of clients for centralized training,which protected client privacy.Data heterogeneity leads to slow convergence of federated learning.The data heterogeneity problem has been an important factor affecting the convergence and accuracy of federated learning.To address this problem,the main researches of this thesis are as follows:(1)This thesis designs and implements the data distribution optimization mechanisms.The local data distribution optimization mechanism and the global collaborative data distribution optimization mechanism are proposed to reduce the gap between the local data distribution and the optimal distribution,respectively,for the cases of local data category unbalance and category lacking.The local data distribution optimization mechanism optimizes the distribution of training data in each batch.The global collaborative data distribution optimization mechanism,mainly for the case of categories lacking,uses the methods of average gradient reverse generation,generative adversarial network and conditional generative adversarial network to construct the collaborative data pool.The experimental results demonstrated that the two data distribution optimization mechanisms can reduce the data heterogeneity problem among clients and improve the performance of federated learning algorithms in data heterogeneous environments.(2)This thesis proposes a new federated learning interaction training strategy,i.e.,serial and parallel combination training strategy.Breaking the traditional mode of parallel training among federated learning,the clients are assigned to different pipelines.Within the pipeline,serial training is performed to reduce the number of local iterations for a single client.Between pipelines,parallel training and model aggregation are still performed according to the traditional mode,preserving the model diversity in the traditional federated learning.On the one hand,serial training allows the pipeline model to be trained by multiple clients,and the data between the clients complement each other,reducing the impact of data heterogeneity on the model.On the other hand,the number of local iterations for a single client on the pipeline is reduced,which reduces the computation of the client.Since the number of clients remains the same,serial training allows fewer clients to communicate with the server,reducing the amount of communication between the server and the clients.Experiments are conducted with different degrees of data heterogeneity on multiple datasets.The experiments demonstrate that the serial and parallel combination training strategy can not only improve the accuracy of the federated learning algorithm in the data heterogeneous environment,but also effectively reduce the computation of clients and the communication between clients and servers.(3)This thesis proposes global knowledge regularization module.Inspired by knowledge distillation,the training of local models is guided by constructing global knowledge.The local knowledge is constructed from the predictions of the local model for each category,and the global knowledge is obtained by aggregating the local knowledge.Inspired by continuous learning,a sliding average approach is used to fuse and update the old and new global knowledge to ensure a stable update of the global knowledge.Global knowledge is used as a regularization term during the local training process.The local models need to fit not only the local data but also the global knowledge.Since the local models all aim to fit the global knowledge,for the clients lacking certain categories,the global knowledge becomes the training criteria,and the gap between the optimization directions of the local models is reduced,which is more conducive to the aggregation of the models.The experiments apply the global knowledge regularization module to several federated learning algorithms,which demonstrated that the global knowledge regularization term can improve the aggregation efficiency of the models and achieve higher accuracy by reducing the gap between local models.
Keywords/Search Tags:federated learning, data heterogeneity, generative adversarial network, knowledge distillation
PDF Full Text Request
Related items