Font Size: a A A

Research And Implementation Of Convolution Optimization And Parallel Scheduling Based On TVM

Posted on:2023-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:C W WangFull Text:PDF
GTID:2568307127989059Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of artificial intelligence technology,the emergence of various artificial intelligence algorithms such as Convolution Neural Network(CNN)increases the difficulty of algorithms deployment and development on different platforms.The Tensor Virtual Machine(TVM),as a universal neural network compiler,can optimize different types of neural networks and generate highly optimized underlying code on hardware platforms.It has become one of the major optimization deployment platforms in the field of artificial intelligence.However,different devices are difficult to make full use of hardware resources due to performance bottlenecks caused by load and communication overhead.Therefore,this thesis based on the behavior analysis of CNN algorithm proposes a parallel scheduling method of convolution optimization and computational graph partition based on TVM.In order to mine the branch information and convolution feature information in CNN algorithm,the behavior pattern of CNN algorithm based on TVM is deeply analyzed.Firstly,extract the branch information from the computational graph by post-order traversal method,obtain the start,end and internal information of the branch,and construct the feature computational graph by the branch information.Secondly,with the help of the TVM,the features of access and computation of convolution are extracted.Finally,divide the computational graph manually according to the obtained branch information.Experimental results show that compared with the traditional TVM method,the method based on branch characteristics achieves an average speed improvement of 18%.Compared with traditional methods,the convolution optimization method based on convolution features achieves an average speed increase of 20%,and feature information is mined effectively.Aiming at the problem of long access time caused by discontinuous data address of Memory Efficient Convolution(MEC)algorithm on traditional devices,an optimization method applying to MEC algorithm access behavior is proposed.The method is divided into two parts:the intermediate matrix transformation and the matrix operation.First,the intermediate matrix transformation is optimized by modifying the data reading order to make the data reading mode conform to the access behavior of the algorithm.Next,for the matrix operation part,the convolution kernel matrix is modified by using the memory data layout more suitable for matrix operation,and the calculation function encapsulated by TVM platform is used to redesign the calculation method of the middle matrix and the convolution kernel matrix.Finally,the platform’s own parallel library is used to speed up the computing process.Experimental results show that compared with the MEC,the average speed is improved by 50%on a single convolutional layer and more than 57%on a multi-layer neural network.In view of the problems that the method of computing graph partition in TVM relies on expert experience and the partition strategy is single,a subgraph partition method based on branch features is proposed.At first,traverse the computational graph forward based on the characteristic information of the branches of the computational graph,search for the start and end nodes of the branches,slice and store them in an array.Then,each node in the array is extracted to construct the subgraph.The dependency configurations of input and output between each subgraph are counted and stored in an array.Finally,using dependency information between subgraphs is to configure input and output dependencies for subgraphs,and select and configure parameters and device information.Experimental results show that when 48 and 96 CPU cores are used,the speed of CNN algorithm is improved by 20%and 15%compared with traditional TVM operation mechanism,which effectively achieves the division of computational graph.To solve the problem that single subgraph cannot be parallel in TVM,a branch parallel method is designed and implemented.Firstly,a directed acyclic graph and a back-order dominated tree are designed to record the key value,order and dependency of nodes.Secondly,the information is used to search for branches,and package the branch nodes into functions and label as parallel nodes.Thirdly,after the parallel graph marking is completed,the parallel runtime is used to process the computational graph,which involves inter-parallel thread pool design,data interaction and operation.Experimental results show that,compared with the traditional serial method of TVM,the branch parallel method can improve the inference speed by 10%and 20%on CPU and GPU,and the algorithm can improve the inference speed by 5%on average compared with the Greedy algorithm,which can make efficient use of hardware device resources.
Keywords/Search Tags:Tensor virtual machine, Memory efficient Convolution, Convolutional neural network, Behavior analysis, Computer graph partition, Parallelization
PDF Full Text Request
Related items