Font Size: a A A

The Design And Implementation Of Object Detection Chip Based On Deep Learning Algorithms

Posted on:2020-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:Y H ZengFull Text:PDF
GTID:2428330596994979Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
With the development of Internet and Moore's law,deep learning develops rapidly due to the convenience of data access and increasing computing power of hardware.At the same time,object detection technology achieves a great improvement with the progress of deep learning.Object detection has a wide range of application scenarios,including surveillance system and merchandise recognition on the Internet which are both computed in the clouds,and real time object detection and map building in embedded equipment.Advanced object detection technology are mostly based on computational-intensive deep learning algorithm,which poses a challenge to resourcelimited embedded equipment.Considering data security problem during communication between embedded equipment and server,embedded equipment should be able to process object detection algorithm locally.However,most of embedded equipment are resource-limited and are not designed to deal with CNN(Convolutional Neural Network).Therefore,it is significant to study and design a deep learning based object detection chip.The hardware architecture,performance,hardware utilization,power and DRAM accesses will be discussed in this paper,and a deep learning based object detection chip is designed and implemented.The following will be focused on in this paper:(1)The characteristic of CNN will be analyzed,and CNN hardware accelerator and storage architecture will be designed and implemented in this paper.Hybrid data reuse pattern is supported to reduce DRAM accesses,which lowers the system power.High computational parallelism is exploited in CNN hardware accelerator because processing element matrix is able to compute 2-d convolution effectively.The register matrix layer combines convolution,batch normalization,activation function in convolution layer and pooling in pooling layer,which can enhance data reuse and accelerate the process between convolution layer and pooling layer.(2)An object detection system based on YOLOv2-tiny with CNN hardware accelerator above is proposed.System test based on fixed point format is completed,meanwhile,hardware/software co-design is applied to partition the whole system trying to take computational advantage from different kinds of hardware.Besides,a detailed hardware architecture and performance analysis about pre-process,post-process and video stream in detection system is given in this paper.(3)Except for running a functional simulation on the CNN hardware accelerator,the whole process of ASIC back-end design with DC and ICC is given in this paper,and its power,area and timing report is analyzed.(4)YOLOv2-tiny is chosen as Benchmark in this paper and Xilinx FPGA is selected as the design and simulation platform.From the simulation and implementation result in Vivado,this architecture can achieve 9.06 GMACs at 100 MHz,while data precision is 32-bit and 16-bit fixed point,and system power is 6.525 W.In addition,detection system proposed can achieve a processing rate of 3.63 fps theoretically.As the timing reports in DC and ICC show,CNN hardware accelerator can run at 100 MHz after back-end design,while its chip area is only 3.5mm×3.5mm and consumes 204 mW in power.
Keywords/Search Tags:Hardware Acceleration, YOLO(You Only Look Once), Software/Hardware Co-Design, FPGA(Field Programmable Gate Array), Back-End Design
PDF Full Text Request
Related items