| Edge computing deploys servers at the edge of the network to provide computing services to users.It is characterized by the ability to complete the computing services required by the end device with a low latency.In real application scenarios,an edge server needs to provide deep learning model inference services for a large number of terminal devices.These tasks are characterized by two dimensions,one is high concurrency,and the other is multi-variety.Targeting at the edge server computing capability represented by graphics processing unit(GPU),related works mainly focus on model simplification and resource scheduling.The problem of the former is the loss of accuracy,while the latter research ignores the fact that batch processing can greatly improve the throughput of edge servers,and relies on numerical and simulation experiments to verify the scheduling performance,which is difficult to implement in real systems.Addressing the above problems,this paper proposes a single type of task scheduling strategy based on batch processing,and a computing resource allocation strategy for multiple types of tasks.The main contributions can be summed up as following:1.we study the scheduling strategies for a single kind of task.Through experimental tests,it is verified that batch processing can greatly improve the task throughput.Based on this,we propose a task scheduling scheme based on dynamic batch processing.Then we analyze the relationship between task arrival rate,batch size and throughput in the system.In the following,we define the optimization problem with the goal of maximizing the throughput.Then we design a low-complexity approximate solution algorithm to find near-optimal size of batch.Finally,we build an experimental bed to evaluate the proposed solution.2.We study the resource scheduling strategy for various kinds of tasks.Considering the problem that multiple types of inference tasks will affect each other during execution,making the system state space too large,we propose a scheduling strategy based on deep reinforcement learning.Firstly,we verify the feasibility of using virtualization technology to divide the computing power of graphics processing unit,and then propose the scheduling system architecture,including task manager and resource scheduler.After that,we define an NP-hard optimization problem describing various types of task scheduling.And we design a scheduling algorithm based on deep reinforcement learning and the corresponding task priority algorithm.Finally,we evaluate the proposed scheme on the experimental bed,and the results show that the strategy proposed in this paper can achieve better performance compared to the existing strategies. |