| Compared with traditional methods,the performance of neural network-based methods is usually much better than that of traditional algorithms in the domain field.And it has been applied in many fields,such as speech recognition,target detection,target segmentation and so on.However,the computational complexity of network-based methods is usually very large,which limits the application of neural network methods in embedded scenarios,such as VR/AR,mobile phones,smart security,automatic driving and so on.In order to solve this problem,this thesis aims to explore an engineering feasible and embedded platform-oriented convolution network deployment scheme,and verify it with FPGA,which provides system-level support for building a convolution network acceleration based on FPGA.Taking YOLO algorithm as the representative,this thesis summarizes the computational characteristics of convolution network,and demonstrates the parallelizability of using FPGA to implement convolution network algorithm.Considering that the computational process of FPGA needs to use fixed-point calculation,which conflicts with the floating-point operation process of existing network,an integer quantization method for convolutional networks is improved.This method uses statistical extremum to dynamically quantify the input,output and weight of the network,ensuring that the network forward derivation uses integer calculation only and solves the numerical calculation conflicts on the FPGA.It is worth mentioning that the improved convolutional network quantization method simplifies the inference of convolutional network while ensuring that the network accuracy loss is less than 2%.Then,based on HLS,the hardware acceleration architecture of YOLO algorithm is designed.The architecture of the system consists of two parts: parallel inference of convolution network with programmable logic and the data scheduling with ARM.We had designed a general platform based on Xilinx ZC706 evaluation board,which is used for algorithm test and verification of the YOLO acceleration architecture.The hardware acceleration architecture of the YOLO algorithm designed in this thesis is 19 times faster than the CPU.Finally,the performance of the network quantization method improved in this thesis is analyzed,and the loss of network performance before and after quantization is compared.At the same time,it shows the resource usage of the hardware implementation,and also compares and analyzes the performance of the YOLO algorithm implemented on the FPGA.At the end of the thesis,the shortcomings of this thesis and the direction of further research are pointed out. |