| With the development of edge computing,more and more edge AI devices with high computing power have emerged.Compared with cloud computing cluster,edge devices have the advantages of low latency,low energy consumption,low price,small size,easy deployment,high heterogeneous flexibility,etc.At the same time,the complexity of tasks in edge scenarios is also increasing,which requires a variety of different deep learning models to complete a task together.Therefore,edge devices usually need to deploy different models at the same time for concurrent inference,so there are many problems worth researching.We first test the performance degradation and power of different lightweight deep learning models for concurrent inference on different edge devices,and find the concurrency adaptability and regularity of various heterogeneous edge devices for different models.Then,we propose an integer linear programming method to maximize the concurrent energy efficiency benefits of the devices to the specific models under the constraints of price budget and the number of edge nodes,optimize the selection of edge devices and model deployment.The optimization results on specific models and various devices also reflect the effectiveness of this optimization method and its scalability in performance and power consumption,which provides a theoretical basis for edge heterogeneity and reflects the advantages of edge heterogeneity.After building an edge cluster and deploying various models,in order to meet the high throughput and low latency requirements of edge inference,the scheduling problem of hybrid tasks in the heterogeneous edge cluster will follow.Therefore,we propose a scheduling algorithm based on reinforcement learning,which uses the posterior results measured at the actual runtime as learning samples,automatically learns the differences between devices according to the performance degradation result of each task in each scheduling,achieves the effect of heterogeneous perception,and then infers the optimal scheduling decision in the current state.We test and verify the effectiveness of our method by experiments.In addition,the task offloading mechanism is also added to our architecture to solve the problem of insufficient accuracy of edge inference.The latency growth under different confidence thresholds also reflects the high cost of cloud offloading.Tasks should be avoided to the cloud as much as possible. |