Font Size: a A A

Research And Implementation Of Hardware Acceleration For Object Detection Network

Posted on:2023-06-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y HuFull Text:PDF
GTID:2568307025969839Subject:Integrated circuit engineering
Abstract/Summary:PDF Full Text Request
Object Detection Network is a deep learning algorithm for locating and classifying objects in input images.It is computationally intensive and parallel,and has been widely used in face recognition,military detection,maritime positioning,automatic driving and other fields.With the development of related technologies,part of the Object Detection Network in order to increase the accuracy of the detection,constantly expand the network structure,increasing the amount of network parameters and the amount of calculation,eventually leading to the network speed also become very slow,it is difficult to deploy on embedded platform with strict requirements on storage resource cost and computing resource cost and certain real-time requirements.Under the requirement of embedded deployment of Object Detection Network,the main work of this paper are as follows:Firstly,the Tiny-YOLOv3 network is selected as the algorithm model of Object Detection Network,and the model optimization design of feature channel pruning,fixed-point quantization of activation value,integer power quantization of weight two and BN parameter fusion is carried out.Secondly,aiming at the optimized algorithm model,a hardware accelerator of convolution layer is designed.The accelerator has an array of highly parallel processing units that can perform convolution operations.Multi-level storage is used to realize ping-pong operation.Calculation after convolution by pipeline design.Finally,a functional simulation verification environment is designed for convolution layer hardware accelerator,and an embedded deployment verification system of accelerator CPU+FPGA is implemented on Znyq chip.The experimental results show that compared with the original model,the optimized Object Detection Network algorithm model produces 5% compression ratio and m AP is 0.47,which can be used for embedded deployment.The convolution layer hardware accelerator generates 48 GOPS computing power without using DSP resources inside Zynq chip,and the total power consumption of the hardware system is 2.656 W.Compared with the existing research,the design of the accelerator has the advantages of less resource consumption and higher performance.
Keywords/Search Tags:Object Detection Network, Network Compression, Hardware Acceleration, Deep Learning
PDF Full Text Request
Related items