| In recent years,the vigorous development of Internet of Things(IoT)technology has spawned a sharp increase in the number of interconnected devices.As a result,massive amounts of data are generated due to the rapid proliferation of IoT devices,which are delivered to the centralized cloud for model training in traditional computing paradigm.However,the centralized processing of IoT data consumes a large amount of bandwidth resources and results in significant response delay,which is not suitable for some latency-sensitive applications.Additionally,transmitting raw data to the cloud may lead to the privacy leakage,since most of the user data produced by IoT devices are sensitive and private.To this end,Edge Computing(EC)pushes the computation,network,storage and other infrastructures from the cloud to the network edge(e.g.,base stations),and the IoT data can be processed on the edge nodes close to the data source.Compared with the traditional computing paradigm,EC avoids the long-haul transmission of raw data,and the communication cost as well as the response delay can be reduced.Furthermore,the user privacy can be preserved to some extent.However,due to the isolation of edge nodes,Federated Learning(FL)is proposed to coordinate multiple edge nodes for model training,so as to make better use of the IoT data.The typical communication architectures in edge computing networks can be divided into three categories:centralized,semi-centralized,and decentralized.Under the centralized communication architecture,each edge node uploads its local model only to the centralized server,and the server aggregates the local models for updating the global model.Under the semi-centralized communication architecture,in addition to uploading the local model to the server,each edge node communicates directly with other nodes.Under the decentralized communication architecture,all edge nodes communicate with their neighbors in a Peer-to-Peer(P2P)manner,without relying on the centralized server.Although FL has been used for model training in various EC applications,the heterogeneity(such as system heterogeneity and statistical heterogeneity)in edge systems still significantly affects the training performance(e.g.,completion time,network traffic consumption,and test accuracy)of FL under the three architectures.To this end,regarding the FL with three typical communication architectures(centralized,semi-centralized,and decentralized),this dissertation proposes the research on performance optimization for FL in heterogeneous edge systems,so as to tackle the system heterogeneity and statistical heterogeneity in EC scenarios.The key contributions of this dissertation are summarized as follows:(1)Under the centralized communication architecture,the synchronous mechanism in FL leads to the synchronization barrier,causing unnecessary waiting time among edge nodes.Moreover,the negative effect of synchronization barrier is worsened by system heterogeneity,which further decreases the resource utilization of edge nodes.To this end,this dissertation proposes to jointly optimize the FL hyperparameters(i.e.,batch size and learning rate).Firstly,we propose to adaptively adjust the batch size of each node according to its processing capacity,so as to narrow the gap among nodes’ epoch time,and the negative effect of synchronization barrier can be alleviated.However,adjusting the batch size based on the processing capacity may affect the gradient variance,and the convergence rate is degraded.In addition,learning rate is one of the most important hyperparameters that affect the convergence rate of the global model.Therefore,we theoretically analyze the relationship between batch size and learning rate,and derive a scaling rule to guide the configuration of learning rate.By jointly optimizing the batch size and learning rate,we can reduce the waiting time of edge nodes and guarantee a satisfying convergence rate.Extensive experimental results demonstrate that the proposed solution can improve the test accuracy by 4%given the resource budget,and reduce the completion time by at most 46.8%when achieving the target accuracy,compared with the existing methods.(2)Under the semi-centralized communication architecture,the model staleness issue causes the performance degradation of the global model on Non-Independent and Identically Distributed(Non-ⅡD)data.To address this issue,this dissertation proposes FedLC,a novel asynchronous FL mechanism that enables the local collaboration among edge nodes.Specifically,at each epoch,apart from uploading the local model to the server,each node transmits its gradient to the other nodes for local collaboration according to the possessed communication resource and data distribution.We theoretically analyze the convergence rate of FedLC and obtain the convergence upper bound related to collaboration relationship.Based on the convergence bound,we utilize a demand-list to design an efficient algorithm,where each row of the demand-list represents the collaborating neighbor set of the related node.In addition,regarding the model staleness in asynchronous FL,we adjust the local learning rate for each edge node based on its participation frequency in global update,so as to compensate for the nodes with poor performance.Extensive experiments show that FedLC can improve the model accuracy by up to 25%under resource constraints,and save approximately 51%completion time while achieving the target accuracy,compared with the baselines.(3)Under the decentralized communication architecture,the communication heterogeneity among edge nodes seriously affects the efficiency of model training,while the statistical heterogeneity poses challenges to users’ requirements on model performance.To this end,this dissertation proposes a novel FL method that combines adaptive model pruning and neighbor selection.Specifically,each node prunes the local model according to its local gradient and the heterogeneous communication resource.By transmitting the pruned model among edge nodes,the communication cost for model training can be reduced.In addition,we utilize the gradient-based cosine similarity as the metric to perceive the difference of local data distributions.That is,only the neighbors whose gradient-based cosine similarity is larger than a predefined threshold are selected.By aggregating the local models from the neighbors with similar data distribution,the users’ requirements on model performance can be satisfied.We theoretically analyze the convergence rate of the proposed method,and study the impact of model pruning and neighbor selection on model performance.Experimental results imply that the proposed solution can improve the test accuracy by 13%given the same time budget,and save the network traffic by 45.4%on average when training with the same number of epochs. |