| With the rapid development of deep learning technology,the scale and complexity of deep neural network are constantly increasing,which requires higher and higher processing performance and energy consumption of computing platform.However,CPU and other common computing platforms are limited by serial execution mode and limited data bandwidth,which are not suitable for large-scale parallel computing of deep neural network.As a result,it is difficult for the general computer system to give full play to its computing power,which limits the performance and energy efficiency of deep neural network.Therefore,special deep neural network hardware acceleration technology has become the focus of current academic and industrial research.FPGA(Field Programmable Gate Array)is a common hardware circuit implementation and verification platform,which has the advantages of reconfigurable,low cost,high parallelism and so on.Therefore,this paper designed the accelerator for deep neural network,and chose FPGA as the verification platform.The research work of this paper mainly includes the following points:1.Aiming at the problem of "storage density" in the full connection layer,a fine-grained clipping algorithm was used to improve the sparsity of the network model weight data,and a run-length coding was proposed to further compress the data volume of the network model and reduce the storage demand of the weight parameters.At the same time,the paper also proposes a matrix vector multiplication circuit in the form of non-reductive coding,which saves the time of data decoding,greatly reduces the number of MAC operations required for calculation,and optimizes the acceleration effect of the full connection layer.2.Aiming at the problem of "computation-intensive" in the convolutional layer,it is proposed to reduce the dimension of multidimensional convolution into the form of parallel computation of multiple one-dimensional convolutions to improve the data reusability and the reuse efficiency of on-chip cache;And through the roof-line theory to find the appropriate circulation block strategy,reduce the SDRAM access stock,thereby reducing the power consumption of memory access,and improve the performance.3.This paper implements the hardware accelerator of AlexNet network and verifies it on FPGA.The correctness and acceleration ratio of the accelerator are verified by comparing the process of the accelerator with CPU.By comparing the accelerator with other accelerators of the same type proposed by others,the differences in the performance of the accelerator in area,power consumption,speed and other aspects are verified. |