Font Size: a A A

ZYNQ-Based Reconfigurable Convolutional Neural Network Accelerator

Posted on:2021-05-03Degree:MasterType:Thesis
Country:ChinaCandidate:L Q MaFull Text:PDF
GTID:2428330605472943Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Convolutional neural network(CNN)is a deep learning structure extended from artificial neural network.In recent years,it has been widely used in video surveillance,mobile robot vision,image search engine and other fields.CNN is a computation-intensive algorithm,and the general-purpose processor cannot fully exploit its parallelism and it is difficult to meet its real-time requirements.At present,CNN are mainly implemented using GPU,but the high power consumption of GPU makes them unsuitable for embedded devices.ZYNQ,as a So C platform supporting software and hardware co-development,can use ZYNQ to develop not only benefit ARM's rich ecosystem resources,but also benefit from the flexibility and reconfigurability of FPGA.This subject is based on the ZYNQ platform,and implements a convolutional neural network accelerator with high parallelism,reconfigurability,high throughput,and low power consumption.This article first introduces the basic principle and structure of CNN,and explores a convolution operation circuit structure that can be calculated in parallel according to the calculation characteristics of CNN algorithm.This paper analyzes the circuit structure and data communication methods of loop unrolling and tile,and proposes the optimal loop design scheme.This design uses the ARM+FPGA computing framework.The hardware side implements the forwa rd propagation calculation of the CNN model,and the software side completes the data transmission and control.Convolutional layer circuit,pooling layer circuit,activation function layer circuit and memory access circuit are designed on hardware.In order to reduce the bandwidth access,this paper proposes a special arrangement of calculation data in memory.In this paper,through the collaborative design of software and hardware,the classification of VGG16 network is completed.The average classificati on time of a single image is 250 ms,and the accuracy of top5 is 91.80%,which is only 0.5% loss compared with the software calculation.The accelerator has an effective computing power of 62.00 GPOS,which is 2.58 times and 6.88 times the GPU and CPU respectively,and its MAC utilization rate is as high as 98.20%,which is close to the theoretical value of the Roofline model.The computing power consumption of the accelerator is only 2.0 W,and the energy efficiency ratio is 31.00 GOPS/W,which is 112.77 times that of the GPU and 334.41 times that of the CPU.Experimental results show that the CNN accelerator proposed in this paper is suitable for embedded devices.Compared with the literature in related fields in recent years,the solution proposed in this paper can provide higher performance in the case of limited resources and power consumption.At the same time,the accelerator is suitable for other systems using neural network architecture,and has high application promotion value.
Keywords/Search Tags:convolutional neural networks, hardware acceleration, Field Programmable Gate Array, Roofline model
PDF Full Text Request
Related items