Font Size: a A A

Research On Low Delay Inference With Limited Communication And Computing Capability

Posted on:2024-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y GaoFull Text:PDF
GTID:2558307040987149Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the development of Artificial Intelligence(AI)technology,neural networks are widely used in various tasks.However,due to the high requirement of neural network on hardware performance,traditional local devices are not able to undertake the complete network inference task due to the cost,computing power,memory,power consumption and other factors,so most of these networks run on cloud servers,which upload information from local devices to cloud servers to complete the calculation.This means that task completion time depends not only on cloud computing power,but also on network communication status.In the case of poor communication,the upload of raw data can take a lot of time,resulting in the task not being completed in time.Therefore,for the low delay inference tasks,how to reduce the impact of inadequate communication and computing power on inference delay and accuracy is a problem that needs to be solved.With this motivation,this paper studies the problem from the two perspectives of algorithm and hardware.The details are as follows:For the low delay inference problem of the neural network under communication constraints,based on Cloud-Edge collaboration,a DNN partitioning structure is used.On this basis,a Threshold-based Data Quantization and Exit(TDQE)method is proposed to optimize the partitioning point data.In order to maximize the use of communication resources,this paper classifies partitioning point data: upload after varying degrees of quantification or exit early locally.In order to obtain a reliable classification threshold,the problem is modeled as an accuracy optimization problem with communication constraints and solved by linear programming.To reduce the negative impact of quantization on accuracy,the quantification range of data is further adjusted,and the two are jointly optimized.Based on the optimization scheme,this paper presents the data classification processing implementation algorithms in the actual scene,including TDQE network construction algorithm and real-time data classification processing algorithm.Compared with the two traditional methods,TDQE method achieves a better balance between real-time and accuracy.For the low delay inference problem of the neural network under computing constraints,this paper optimizes the running speed of the edge detection operator commonly used in image processing tasks.First,the function of edge detection operator is implemented using C language,including building the development and testing framework of C language version,defining basic data types,implementing related tool functions,completing the function development of edge detection operator,and completing the function accuracy test.Then use hardware level optimization for operators: apply the single instruction multiple data(SIMD)acceleration idea to the acceleration of edge detection operator,use ARM NEON instruction set to optimize the convolution operation in the computational volume set during edge detection,and inline to the original operator to make calls on the hardware platform that meets the requirements; In addition,the OpenCL framework is used to speed up edge detection operators in parallel,and the OpenCL C programming language is used to write kernel functions to complete the logical processing of a single computing unit.The OpenCL framework can implement data parallel processing on hardware during operator operation.The experimental results show that using SIMD mode and OpenCL framework to accelerate the edge detection operator has higher efficiency.Finally,in order to test the effectiveness of the proposed algorithm in the actual scene,the experiment is carried out with the help of HiLens platform.Selecting multimedia terminal devices with AI inference capabilities and high performance servers as hardware environments,developing tasks in the development framework and development environment supported by HiLens platform,and experiment the performance of TDQE method under communication constraints and the performance of accelerated edge detection operator.The results show that the TDQE method has better real-time and accuracy performance than the two traditional algorithms,and the accelerated edge detection operator has higher efficiency,that is,the above optimization algorithm achieves better performance in actual hardware applications.
Keywords/Search Tags:Cloud-Edge collaboration, DNN partitioning, linear programming, edge detection, hardware level optimization
PDF Full Text Request
Related items