| In recent years,with the gradual increase in the number of Internet of Things(IoT)devices,the amount of data are explosively increasing.The interconnection of devices through IoT platform is the basis for the realization of the intelligent network.The artificial intelligence(AI)technology always is adopted to intelligently analyze the massive data generated by IoT,so as to improve the quality of products and services.In practice,The most commonly fusion mode between IoT and AI is:the model training is performed in the cloud server based on the data received from the IoT devices.Then,the trained model will be used to intelligently analyze the sensing data(such as knowledge inference).However,with the exponential growth of the data,more network bandwidth and higher transmission delay will be required for model training,if all data are sent to the cloud server.Therefore,it is not suitable for the applications requiring low delay and bandwidth consumption.Edge computing technology can be applied to the integration of IoT and AI to reduce the bandwidth consumption and improve the performance of network transmission,as well as the privacy protection of sensitive data.However,the performance of distributed model training(or federated learning)is significantly degraded because of limited resources,heterogeneous systems,dynamic network and imbalanced data in edge computing networks.In order to solve the above challenges,this paper proposes the research on the model training with high-precision requirements in edge computing.The main contents and contributions are as follows:1.In order to solve the problem of limited resources in the switch(flow table)and servers,this paper proposes an incremental deployment of the server and network function based on the wildcard,which provides basic guarantee for distributed model training.The existing work mainly focuses on reducing the cost of server deployment,while ignoring the resource constraints of switches(such as the limited TCAM table).Therefore,when there are more task or data flow requests routing in the network,a large number of flow table items(or forwarding rules)will be deployed on the switch,resulting in massive control overhead.To solve this problem,we propose an incremental server deployment(INSD)problem to construct a scalable edge computing network.We prove that the proposed INSD problem is NP-hard,and there is no polynomial-time algorithm with constant approximation ratio.We then present an efficient algorithm with an approximation ratio of 2·H(q·p),where q is the number of VNF’s categories and p is the maximum number of requests through a switch.We evaluate the performance of the algorithms through large-scale simulation with Pica8 switch and virtual OpenvSwitch(OVS).The testing results show that the proposed algorithm has great scalability.Our proposed scheme can reduce the number of forwarding rules by about 88%and the control overhead by about 82%by increasing cost of the server deployment by about 5%,compared with existing solutions.2.In order to solve the problem of too long training time caused by synchronous barrier in distributed model training(or federated training),this paper proposes an adaptive asynchronous federated learning(AAFL)mechanism which deals with the dynamic networks.The parameter server will perform global updating according to the arrival order of local updates sent by the clients in each epoch.Specifically,only a·n local updates will participate in the global updating,where 0<α<1 and n is the number of all clients in the network.Then,we theoretically analyze the convergence rate of the model training in AAFL,and achieve a convergence upper bound related to α.We adopt deep reinforcement learning(DRL)to well adapt to the changes of dynamic network environment.By taking the training state and network resources as the input of DRL system,we can get the optimal a value of each round.Finally,we conducted a large number of simulation and test-bed experiments,and the results proved the efficiency of our proposed method.For example,AAFL can reduce the training time by about 69%while achieving the similar training performance to the synchronization scheme.Besides,AAFL also improves the test accuracy by about 18%with resource constraints compared with the other benchmarks.3.In order to solve the problem of performance degradation caused by data imbalance in edge computing,this paper propose an efficient framework,which integrates a model migration strategy into the pioneer FL algorithm.Since the clients are in different geographical locations,the data collected by the edge devices from these clients are significantly varied.Therefore,the data on each edge device is not independent and identically distributed(Non-IID),which affects the accuracy and convergence rate of global model training.To address the non-IID challenge,our proposed scheme guides one client to forward its local model to another client,which is equivalent to training it over more data from different clients.We first analyze the convergence of the proposed method,and prove that it can reduce the parameter difference between the global model of distributed model training and the model of centralized training.Then,we formalize a federated learning with model migration(FLMM)problem,and propose a method based on deep reinforcement learning(DRL)to determine the migration policy among the clients.A large number of experiments on three classical datasets show that the proposed scheme can improve the test accuracy by about 13%with resource constraints and reduce the bandwidth consumption by about 42%when achieving similar performance to the existing solutions.4.In order to alleviate the network congestion at the server because of the frequent communication between the parameter server(PS)and the clients,this paper proposes a decentralized federated learning(DFL)mechanism with probabilistic communication.Besides,it can effectively solve the problem of accuracy degradation caused by system heterogeneity and data imbalance.Specifically,we adopt peer-to-peer(P2P)communication to reduce the pressure at the server.We propose an efficient approximation algorithm to assign an appropriate probability to each communication link according to the resource and data distribution of the clients.A large number of experiments on classical models and datasets show the efficiency of the proposed scheme.Specifically,compared with the advanced solutions,this scheme can reduce the completion time of model training by about 55%,and improve the test accuracy by about 11%with the bandwidth constraint.Through the above methods,this paper can effectively solve the problem of test accuracy degradation caused by the limited resources,system heterogeneity and data imbalance in edge computing.Besides,we develop and design the distributed model training system to verify the proposed methods. |