Design And Application Of Convolutional Neural Network Accelerator Based On FPGA

Posted on:2022-03-02

Degree:Master

Type:Thesis

Country:China

Candidate:P C Bai

Full Text:PDF

GTID:2518306329452164

Subject:Master of Engineering (Electronics and Communication Engineering)

Abstract/Summary:

PDF Full Text Request

Convolutional neural network perform well in machine vision and natural language processing.With the development of convolutional neural networks,models have become more and more complex.In order to deal with complex network models,it is necessary to rely on the support of hardware with super computing power.The traditional general-purpose processor CPU is increasingly unable to meet the requirements in terms of computing speed and power consumption.Although the graphics processor GPU has excellent performance in terms of processing speed,it is difficult to apply to portable devices due to high power consumption.Field programmable gate array(FPGA)has faster processing capacity than CPU in terms of processing speed,and lower power consumption than GPU in terms of power consumption.Therefore,it is studied to use FPGA to process convolutional neural network to achieve the purpose of high efficiency and low power consumption.Based on the FPGA platform of PYNQ-Z2,this paper has completed the design of the YOLOv2 network model accelerator and applied it in image detection.Analyze and study the network model of YOLOv2 in the convolutional neural network in this article.In order to speed up the operation speed of the convolutional neural network and reduce resource consumption,based on the inherent resources of the PYNQ-Z2 platform,it is proposed to convert the floating-point operations in the neural network to fixed-point operations.d on the FPGA.The time complexity and space complexity of the YOLOv2 network model are analyzed and calculated.Optimize the design of three types of cache structures in the storage module: feature input buffer,local storage of processing units,and weight buffer to reduce the time delay caused by data transmission.Optimize the design of weight reuse,reduce the occupation of on-chip static random access memory,and reduce off-chip access to intermediate data.Optimize the design of the convolution module,and adopt the methods of convolution loop block,loop unrolling and loop exchange to improve the parallel computing capability.The design of the pooling module adopts maximum pooling,which can reduce logical operations to a certain extent.Optimize the design of the output buffer module,and use a single buffer output mode to reduce data transmission and select the Re LU function as the activation function.According to the characteristics of each layer of the YOLOv2 network and the resource limitations of the PYNQ-Z2 platform,the accelerator adopts a heterogeneous acceleration scheme.The convolutional neural network,the convolutional layer,the pooling layer,and the activation function,which have a small number of parameters and a large amount of calculation,are implemented on the FPGA.The fully connected layer and other layers with a large number of parameters and a small amount of calculation are implemented on ARM.In the process of FPGA core implementation,design methods such as vectorized data and calculation pipeline are adopted to improve the resource utilization rate of FPGA and optimize the acceleration performance of parallel computing of FPGA.In order to verify the performance of the designed YOLOv2 network accelerator,pictures were randomly selected in the test set for detection and identification.The experimental results show that the processing speed of the accelerator has been significantly improved and has met the expected requirements.The specific experimental data are as follows: Under the working frequency of 150 MHz,the static power consumption of the accelerator is 0.187 W,the dynamic power consumption is 2.524 W,the power consumption is 2.711 W,and the processing speed is 27.03 GOPS.Compared with the CPU processor E5-2620v4,the power consumption of this design is about 3% of the processor,and the performance-to-power ratio is improved by nearly 200 times.Compared with the convolutional neural network that uses the ZYNQ platform to accelerate floating-point operations,the processing speed is increased by approximately 1.58 times,and the power consumption is reduced by about 32%.Compared with the FPGA-based Open CL platform to accelerate Tiny-Yolo-v2,the processing speed is increased by 25%,and the efficiency is increased by 2%.

Keywords/Search Tags:

Convolutional Neural Network, FPGA, picture processing, YOLOv2, PYNQ-Z2

PDF Full Text Request

Related items

1	Implementation Of Low Precision Neural Network Based On PYNQ
2	Deep Learning Algorithm Acceleration Based On FPGA
3	Design And Research Of Convolutional Neural Network Accelerator Based On PYNQ Embedded Platform
4	Research On Neural Network Accelerator Based On PYNQ
5	Research And Implementation Of Convolutional Neural Network Accelerator Based On FPGA
6	Research And Implementation Of Image Classification And Recognition Technology Based On PYNQ
7	Research And Implementation Of YOLOv2 Network Based On FPGA
8	Deep Learning Accelerator Design And Implementation For EEG Classification On FPGA
9	FPGA-based Human Action Recognition Algorithm Acceleration And Implementation
10	Research On Detection Technique Of Unusual Behavior Of Workshop Workers Based On FPGA