| Convolutional neural network perform well in machine vision and natural language processing.With the development of convolutional neural networks,models have become more and more complex.In order to deal with complex network models,it is necessary to rely on the support of hardware with super computing power.The traditional general-purpose processor CPU is increasingly unable to meet the requirements in terms of computing speed and power consumption.Although the graphics processor GPU has excellent performance in terms of processing speed,it is difficult to apply to portable devices due to high power consumption.Field programmable gate array(FPGA)has faster processing capacity than CPU in terms of processing speed,and lower power consumption than GPU in terms of power consumption.Therefore,it is studied to use FPGA to process convolutional neural network to achieve the purpose of high efficiency and low power consumption.Based on the FPGA platform of PYNQ-Z2,this paper has completed the design of the YOLOv2 network model accelerator and applied it in image detection.Analyze and study the network model of YOLOv2 in the convolutional neural network in this article.In order to speed up the operation speed of the convolutional neural network and reduce resource consumption,based on the inherent resources of the PYNQ-Z2 platform,it is proposed to convert the floating-point operations in the neural network to fixed-point operations.d on the FPGA.The time complexity and space complexity of the YOLOv2 network model are analyzed and calculated.Optimize the design of three types of cache structures in the storage module: feature input buffer,local storage of processing units,and weight buffer to reduce the time delay caused by data transmission.Optimize the design of weight reuse,reduce the occupation of on-chip static random access memory,and reduce off-chip access to intermediate data.Optimize the design of the convolution module,and adopt the methods of convolution loop block,loop unrolling and loop exchange to improve the parallel computing capability.The design of the pooling module adopts maximum pooling,which can reduce logical operations to a certain extent.Optimize the design of the output buffer module,and use a single buffer output mode to reduce data transmission and select the Re LU function as the activation function.According to the characteristics of each layer of the YOLOv2 network and the resource limitations of the PYNQ-Z2 platform,the accelerator adopts a heterogeneous acceleration scheme.The convolutional neural network,the convolutional layer,the pooling layer,and the activation function,which have a small number of parameters and a large amount of calculation,are implemented on the FPGA.The fully connected layer and other layers with a large number of parameters and a small amount of calculation are implemented on ARM.In the process of FPGA core implementation,design methods such as vectorized data and calculation pipeline are adopted to improve the resource utilization rate of FPGA and optimize the acceleration performance of parallel computing of FPGA.In order to verify the performance of the designed YOLOv2 network accelerator,pictures were randomly selected in the test set for detection and identification.The experimental results show that the processing speed of the accelerator has been significantly improved and has met the expected requirements.The specific experimental data are as follows: Under the working frequency of 150 MHz,the static power consumption of the accelerator is 0.187 W,the dynamic power consumption is 2.524 W,the power consumption is 2.711 W,and the processing speed is 27.03 GOPS.Compared with the CPU processor E5-2620v4,the power consumption of this design is about 3% of the processor,and the performance-to-power ratio is improved by nearly 200 times.Compared with the convolutional neural network that uses the ZYNQ platform to accelerate floating-point operations,the processing speed is increased by approximately 1.58 times,and the power consumption is reduced by about 32%.Compared with the FPGA-based Open CL platform to accelerate Tiny-Yolo-v2,the processing speed is increased by 25%,and the efficiency is increased by 2%. |