| Convolutional neural network is a well-known classic neural network model.Convolution calculation is one of the most important forms of calculation in convolutional neural network.How to reduce the power consumption and increase the speed of convolution calculation is convolution.An important problem in the research of neural networks,the existing CPU has been difficult to meet the speed and power consumption requirements of convolution calculations,so this paper proposes a convolutional neural network acceleration system based on asynchronous methods to solve the problem of convolution calculations.First of all,this paper conducts a full investigation on the algorithm of convolutional neural network,and chooses Caffe framework to implement the algorithm of convolutional neural network.But in the recognition operation of this model,a lot of time is used in the convolution calculation.In order to more effectively accelerate the convolution algorithm in Caffe framework,this paper proposes a FPGA-based convolutional neural network computing acceleration system.The software part of this system is based on the Mtcnn model of the Caffe framework,and the hardware system is based on asynchronous Convolution calculation accelerator,the two exchange data through DMA managed by DDR.ZYNQ is an FPGA development board developed by Xilinx.It consists of the PL side of the FPGA programmable logic part and the PS side with Arm as the core.In this method,the convolution algorithm of the convolutional neural network is implemented on the PL side FPGA,and the data is exchanged with the Linux operating system on the PS side through DMA.After FPGA completes the convolution calculation,the calculation result is returned to Linux through DMA.The specific structure of the convolution calculation is described in Verilog language and is comprehensively implemented by Vivado.The calculation of millions of cycles can be optimized to 10,000 cycles.Secondly,this paper details the convolution calculation module and on-chip storage mechanism on FPGA.It mainly includes the read-write mode of weight matrix and picture matrix and the calculation method of matrix multiplication.Afterfull investigation and analysis,it is found that asynchronous handshake is used.The signal instead of the synchronous clock signal is easier to modularize and convenient to manage,and can effectively avoid problems such as clock skew and slow speed.In terms of power consumption,the design of this paper also has great advantages.Similarly,through simulation tests,the power consumption of the floating-point adder IP core provided by Xilinx is 21.843 w,while the power consumption of this design is7.006 w.This power consumption advantage This will become more apparent as the frequency of use of the floating-point adder increases.Finally,the test data results after actual implementation show that the design of this paper is about twice as fast as face recognition based on the original CPU based on ZYNQ7020. |