Research On Hardware Acceleration Technology Of Convolutional Neural Network And Implementation On FPGA

Posted on:2021-08-17

Degree:Master

Type:Thesis

Country:China

Candidate:H Y Cao

Full Text:PDF

GTID:2518306479963209

Subject:Master of Engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of deep learning technology,the scale and complexity of deep neural network are constantly increasing,which requires higher and higher processing performance and energy consumption of computing platform.However,CPU and other common computing platforms are limited by serial execution mode and limited data bandwidth,which are not suitable for large-scale parallel computing of deep neural network.As a result,it is difficult for the general computer system to give full play to its computing power,which limits the performance and energy efficiency of deep neural network.Therefore,special deep neural network hardware acceleration technology has become the focus of current academic and industrial research.FPGA(Field Programmable Gate Array)is a common hardware circuit implementation and verification platform,which has the advantages of reconfigurable,low cost,high parallelism and so on.Therefore,this paper designed the accelerator for deep neural network,and chose FPGA as the verification platform.The research work of this paper mainly includes the following points:1.Aiming at the problem of "storage density" in the full connection layer,a fine-grained clipping algorithm was used to improve the sparsity of the network model weight data,and a run-length coding was proposed to further compress the data volume of the network model and reduce the storage demand of the weight parameters.At the same time,the paper also proposes a matrix vector multiplication circuit in the form of non-reductive coding,which saves the time of data decoding,greatly reduces the number of MAC operations required for calculation,and optimizes the acceleration effect of the full connection layer.2.Aiming at the problem of "computation-intensive" in the convolutional layer,it is proposed to reduce the dimension of multidimensional convolution into the form of parallel computation of multiple one-dimensional convolutions to improve the data reusability and the reuse efficiency of on-chip cache;And through the roof-line theory to find the appropriate circulation block strategy,reduce the SDRAM access stock,thereby reducing the power consumption of memory access,and improve the performance.3.This paper implements the hardware accelerator of AlexNet network and verifies it on FPGA.The correctness and acceleration ratio of the accelerator are verified by comparing the process of the accelerator with CPU.By comparing the accelerator with other accelerators of the same type proposed by others,the differences in the performance of the accelerator in area,power consumption,speed and other aspects are verified.

Keywords/Search Tags:

Deep convolution neural network, accelerator, FPGA, storage intensive, computation intensive

PDF Full Text Request

Related items

1	Optimize The Data-intensive Oriented Application Of Web Services Composition
2	The Optimization Of High Performance Computer Storage System For I/O-intensive Applications
3	Research And Design Of A Key Technology For Accelerating Convolution Computation Based On FPGA
4	Research On Parallel Optimization Technology For Accelerating Data-intensive Algorithms
5	Design And Optimization Of Convolution Array Accelerator Based On FPGA
6	The Design And FPGA Verification Of CNN Accelerator Based On Group Pruning
7	Optimization Of Deep Separable Convolution Neural Network And Hardware Accelerator Design
8	GPU and FPGA Coprocessors for Data Intensive Computations
9	Face Recognition Algorithm Design And FPGA Verification Based On Deep Seperable Convolution
10	Research On Computation Offloading Approaches Supporting Complex Applications In Dummy Areas Of Edge Computing