Font Size: a A A

Design And Optimization Of Object Detection System For Neural Network Accelerations

Posted on:2023-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:S C LiFull Text:PDF
GTID:2558306848458214Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the wide application of artificial intelligence algorithms in intelligent medical treatment,face recognition,autonomous driving and other fields,more and more researchers are committed to deploying neural networks in resource-constrained embedded scenarios.The biggest challenge of deploying neural networks in embedded devices is the high requirement of computing and storage capacity.Therefore,the deployment of some small deep neural networks on embedded devices has become a research hotspot in the industry.On the premise of meeting the performance requirements,the key of this paper is to design and implement an embedded object detection optimization system for lightweight convolutional neural network acceleration.The main research work of this paper is as follows:(1)Build the hardware and software implementation platform and deploy the Yolov2-TINY algorithm to the designed experimental platform for verification,and design the neural network accelerator,Zyn Q-based embedded system and its software applications.The experimental results are analyzed by serial tracking data collection,and a verification method is designed to analyze the system performance such as target detection accuracy,throughput rate and resource utilization rate.The feasibility and optimization effect of each component and the whole framework are verified by experimental analysis.(2)The memory application method of PL(Programming Logic)in embedded systems is studied,and the dynamic memory allocation method suitable for bare-metal PL is designed.This method has lightweight memory management ability,which can effectively reduce memory fragmentation and improve the stability of the system deployed on bare-metal machines.Design and optimize the neural network IP core scheduling flow in bare-metal system.(3)The fixed-point quantization technology is introduced to perform 8-bit fixed-point quantization on the weight and feature map of the neural network,which greatly reduces the size of the computing unit and model and reduces the resource consumption of the hardware while ensuring the accuracy loss within an acceptable range.The main convolution computing unit and maximum pooling computing unit in object detection are optimized and analyzed.In this paper,the deployment of Yolov2-TINY object detection algorithm is completed on Xilinx Zynq-XC7Z045 platform.Only 43.76% BRAM and 17.11% DSP resources in the chip are used,and the chip power consumption is only 5.534 W.The performance reaches 65.1GOP/s,and compared with similar advanced research,the throughput is increased by 2.14 times,which has considerable advantages.The heterogeneous target detection and processing system built by hardware/software collaboration technology has the characteristics of stable operation,small computation,fast running speed,easy transplantation and so on,which has a very large application prospect.
Keywords/Search Tags:FPGA, Zynq, YOLOv2-Tiny, Neural Network Accelerator
PDF Full Text Request
Related items