Font Size: a A A

Design And Imp Lementation Of AI Acceleration System Based On Network-on-Chip

Posted on:2021-10-24Degree:MasterType:Thesis
Country:ChinaCandidate:X LiFull Text:PDF
GTID:2518306050969309Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Neural Network is a typical application of Artificial Intelligence Algorithm,which has been widely used in many fields,such as image recognition,object detection,gesture recognition,natural language processing,and so on.The number of logical calculation will reach billions or even tens of billions of times as the Neural Network application is processed on a processing system.The data communication in most existing systems is based on the data bus technology,which is used for data transmission between off-chip memories and computing units on chip.Because of the low memory access bandwidth on the existing systems,it is difficult to meet the data communication demand in the processing Neural Network applications.This makes the performance improvement of Neural network acceleration system get bottlenecks.Network-on-Chip(No C)uses the network technology into the chip design.The neural network acceleration system based on No C reduces the number of accesses to off-chip memory by the way of reusing data,which makes the memory access pressure in acceleration systems relieved,the parallelism of the computing unit work improved,and the system processing speeded up.The existing Neural Network acceleration system based on No C has some problems such as irrational computing unit mapping strategy and low system processing flow efficiency.How to optimize the computing unit mapping strategy and design an efficient system processing pipeline has become the key to the research of Neural Network acceleration system based on No C.We studied irrational computing unit mapping strategy in the acceleration system.There is only the single data transmission channel used for the data transmission when the neural network application is processed on the existing Neural Network acceleration system based on No C due to the strong regularity of the neural network application process,which makes the communication resources of the No C architecture insufficient used.To solve this problem,we design a Multiple Channel Parallelization Acceleration Strategy(MCPAS),which is based on the multi-channel parallelization of No C.We map multiple convolution calculation demands on the same set of computing units,and use the different data channels of the No C architecture insufficient to transmit the different convolution calculation results parallel.To meet the increasing data processing requirement caused by multiple convolution calculations,we use the time slot design to calculate and transmit the different data from computing units,which reduces the idle time of the data channels and computing units in the No C architecture,enhances the data processing parallelization,and increases the processing speed of the acceleration system.In this thesis,we implement the FPGA-based acceleration scheme demonstration prototype used MCPAS process.The result shows that the acceleration strategy MCPAS is 430% and 237% faster than the traditional hardware solution and the traditional software solution in the processing of vggnet-16 inference.We also studied low system processing flow efficiency in the acceleration system.In the acceleration system which processing flow is not pipelined,the number of data fliters contained in a data packet transmitted on No C is low,which makes the communication resource utilization and effective data transmission rate of acceleration system is low.To solve this problem,we design a Pipelined Transmission Computing Unit Acceleration Strategy(PTCAS),which is based on the parallelization of computing unit processing flow.Combining the strong data reusability of neural network application,we use the pipeline technology and design computing cluster which is made up by multiple multiplication calculation units.The multiplication calculation units in the same computing cluster will use the same input image data to process convolution calculations with different sets of convolution kernels.The calculation results in the same computing cluster will be transmitted as a data packet.The data transmission process forms an efficient pipeline with multiple units,which can reduce the idle time of the transmission channel,and improve the effective data transmission rate.The simulation result shows that in the processing of vggnet-16 inference,the acceleration strategy PTCAS is 133% faster than traditional hardware solution.The acceleration system combining MCPAS and PTCAS is faster than traditional hardware solution by 357%,and the performance has been further improved compared to using the two acceleration solutions alone.
Keywords/Search Tags:Artificial Intelligence, Convolutional Neural Network, Network-on-Chip, parallelization, data packet partitioning, accelerate
PDF Full Text Request
Related items