Font Size: a A A

Design And Implementation Of Federated Learning Algorithms For Heterostructure And Heterogeneous Data

Posted on:2024-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:L J PengFull Text:PDF
GTID:2568307079472454Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Data has emerged as a new element in the era of the digital economy,possessing both resourcefulness and strategic importance,and serving as a crucial component of productivity.Machine learning algorithms,as the main drivers of data value extraction,have been widely applied across various industries,yielding significant economic benefits.However,as legal regulations continue to improve,privacy protection measures have become increasingly stringent.The era of consolidating and merging heterogeneous and disparate data into a single dataset for mining purposes has passed.Instead,data within various "data silos" exist independently,meaning that relevant data with correlations exists in the form of heterogeneous or disparate entities that cannot be physically centralized.There is an urgent need to design appropriate solutions based on the specific distribution of data in order to extract its value.Nevertheless,such a demand faces several challenging issues.In the context of heterogeneous data distribution,it is necessary to combine different features from distinct structured datasets located at different physical sites and identify their correlations.Traditional centralized recommendation algorithms lack continuity and cannot operate in a distributed environment,raising concerns regarding data privacy.Additionally,in scenarios with highly heterogeneous data distributions,improving algorithm performance and convergence speed becomes crucial.This thesis addresses these challenges by integrating relevant theories such as federated learning and knowledge distillation.The main contributions of this work are as follows:This thesis applies the Deep Factorization-Machine(Deep FM)algorithm to the environment of distributed storage for heterostructure data and enhances it with a vertical federated learning approach.A detailed analysis of the calculation structure of the Deep FM algorithm is conducted,and based on the requirements of distributed computing environments,the original calculation structure of Deep FM is redesigned.Privacy protection measures based on confusion and differential privacy are employed to ensure the privacy of intermediate data.The performance of the improved algorithm is comparable to that of the original centralized algorithm.Furthermore,this thesis combines the federated distillation technique from horizontal federated learning to address the challenges posed by heterogeneous data.It reduces the demand for encryption algorithms during training by leveraging public datasets for distillation and solves the privacy issues related to intermediate data.The factor-transfer technique is introduced to utilize the intermediate knowledge within local models,employing interpreters and translators to transfer knowledge factors,thereby improving the convergence speed of the algorithm.Based on this,the distance between global convergence points and local convergence points is determined by considering the local training losses of each node.By adjusting the learning rate of local model training,the global convergence point can be adjusted,ultimately finding a relatively fair convergence point for all nodes,thus effectively improving algorithm performance in highly heterogeneous distributed storage environments.And,this thesis designs and implements a federated learning system in the Java environment.The federated learning system provides support for both horizontal and vertical federated learning,offering interfaces for training control,logging,and calculation structure construction,among other functionalities,and enabling secondary development.development.
Keywords/Search Tags:Federated learning, Machine learning, Knowledge distillation
PDF Full Text Request
Related items