Font Size: a A A

Design And Implementation Of Reconfigurable Special Hardware Accelerator For Convolutional Neural Network Based On FPGA

Posted on:2024-01-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y W ZhaoFull Text:PDF
GTID:2558307079960109Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the improvement of computing power and memory of computer,deep learning algorithms have a great development.The CNN(Convolutional Neural Networks)is widely used in image classification,identification,and target detection.However,the CPU(Central Processor Unit)cannot be competent for the high-computing CNN model.And GPU(Graphics Processing Unit)has very large scale computing performance,and its power consumption is difficult to meet embedded scenarios.The hardware of ASIC(Application Specific Integrated Circuit)has problems such as long R&D cycle and insufficient flexibility.In contrast,FPGA(Field Programmable Gate Array)is flexible,which can meet the characteristics of the large-scale computing of CNN,and can adapt to low power consumption and high energy efficiency scenes.This thesis designs a reconfigurable special CNN accelerator based on FPGA.Firstly,this thesis analyzes the basic operator of CNN,studies the optimization technology of convolution loop from three aspects: loop tiling,loop unrolling,and loop interchange,and shows the basic optimization scheme used in this thesis.Then this thesis maps the optimization technology and hardware implementation,and quantitatively analyzes and evaluates the effects of different optimization parameters on the accelerator to computing delay,access delay,and memory.For features and weights,this thesis also proposes a hardware-friendly quantitative scheme.After that,based on these optimization technologies,this thesis introduces the implementation of the accelerator in detail.The accelerator is implemented in a parameterized way,and uses variable parameters to achieve different optimization strategies for different convolutional layers to maximize the usage of calculations and memory resources.In parallel computing,this thesis also proposes a dynamic scheme,which combines fixed parallel size with variable parallel size,taking into account logical resources and flexibility.At the same time,the accelerator also has the feature of rapid reconfigurable hardware-software co-design.It uses software to generate configuration files and data layout,and evaluates the performance of the accelerator to assist the deployment of the hardware accelerator.Finally,this thesis set up a verification test platform based on the ZCU102 development board to verify the functional correctness of the accelerator,and test the inference performance of the accelerator with the VGG-16 network model.The model is tested on the ISLVRC2012 dataset,and the accuracy rate is only 0.380% lost with fixedpoint operations of 8/16 bits.The accelerator uses 2048 MACs for convolution.The computing clock is 136 MHz,and the power consumption is 5.65 W.The delay of single feature inference is 94.19 ms,and the average performance reaches 328.49 GOPS,while the average performance of convolution operation reaches 472.71 GOPS.The overall energy efficiency ratio is 10.57 times of that of CPU inference,which can meet the requirements of low power consumption and high performance.
Keywords/Search Tags:FPGA, Hardware Acceleration, Convolutional Neural Network
PDF Full Text Request
Related items