| Traditional centralized machine learning requires that all data collected on local devices(e.g.,cell phones)be stored centrally in a data center or on a cloud server.This requirement not only raises concerns about privacy risks and data leakage,but also places very high demands on the storage and computing power of servers and communication capabilities when the amount of data is huge.The goal of federated learning is to train models on different devices in a distributed manner without invading user privacy.Although federated learning provides a new paradigm for privacy-preserving machine learning,there are many problems in practical applications.Differences in data distribution across clients can lead to data heterogeneity,while differences in computational power,communication capabilities,or model architecture and the way training models can lead to system heterogeneity.When the heterogeneity is more serious,it will cause client-side drift,making the convergence curve unstable and the model performance worse.In this paper,we propose two different gradient correction methods in order to solve the special case of non-independent identical distribution of data(non-IID),polarized distribution,i.e.,there is only one type of data in per client.The main work is as follows.Federated optimization algorithms based on pseudo-data sharing(Fed Sfd).The usual algorithm for solving local data non-independent and identically distributed is to constrain the aggregation and update operations of the server or to intervene in the update process of the client.Such methods will oscillate during training when the data is not independently and identically distributed.Therefore,the client needs to balance data privacy and model performance.The federated optimization algorithm based on fake data sharing uses a differential autoencoder to generate fake data on the edge distribution to improve the stability of the training process.At the same time,it also provides optional privacy protection parameters so that different clients can have different privacy restrictions on the generated fake data.Although this method exposes the approximate location of the original data in space,it improves the convergence speed of the model.Experimental results show that this method can greatly improve the stability of model training while sacrificing some privacy.A regression-based binary classification federated optimization algorithm based on dual-model integration.Classification models trained with single-type data will quickly collapse to local optima,but for regression tasks,the collapse speed will be much slower.For this reason,a dual-model integrated regression-based binary classification algorithm is proposed.This method first trains the model using regression,allowing training data of the same category to fit a single vector.Since the collapse speed is slow,the loss value of other data that has not been trained on this model will be very large.Then the two models are integrated,and the type of data is judged by judging the size of the loss on the two models.In this way,the problem of global model oscillation caused by the aggregation of models with different gradient update directions can be avoided.Test results on MNSIT and FashionMNIST datasets show that under the same experimental conditions,the regression-based binary classification federated optimization algorithm based on dual-model integration is superior to other algorithms that solve non-IID problems in terms of convergence speed and accuracy. |