Font Size: a A A

Research On Federatedlearning Methods For Unbalanced Data

Posted on:2022-06-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2518306338467124Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the coming of big data era,the data from either the same or different industries can be distilled to produce great value by artificial intelligence(AI)based technologies.Traditional centralized AI techniques gather data and extract information for a special task.However,with the rapid development and application of big data,the security and privacy protection of data have attracted much more attention,and many laws and regulations have been issued to restrict the arbitrary circulation and application of private data,which results in "data islands" problem.In response to this problem,Google proposed federated learning(FL)technique,which is a new machine learning framework to enable various data owners sharing the value of data instead of data via collaborative training under the premise of ensuring data security and data privacy.As a mainstream FL model,Federated Averaging algorithm proposed by Google shares the learned knowledge by calculating model parameters or gradient information,which has the following shortcomings:1)severe performance degradation in case of significant imbalance of object types between the partners;2)security risk by the sharing of model parameters.In this thesis,two improved FL models are developed for the above two problems with severely imbalanced dataset.In the case of severely imbalanced dataset with low privacy requirement on machine learning model,a model parameter sharing-based FL model is designed,which modifies the model training method by iterating the transform of model parameter between partners.Its performance is evaluated on the open dataset and results verify that it approaches the performance of centralized counterpart model.For severely imbalanced data with much higher sensitivity demand on machine learning model,knowledge distillation and transfer-based FL model is developed,which can flexibly support heterogeneous models between partners.Moreover,a fake public data set generation method is designed to solve the insufficient public data problem,which greatly improves the classification accuracy.
Keywords/Search Tags:federated learning, machine learning, unbalanced and highly sensitive data sets, knowledge distillation
PDF Full Text Request
Related items