Design And Optimization Of Object Detection System For Neural Network Accelerations

Posted on:2023-04-20

Degree:Master

Type:Thesis

Country:China

Candidate:S C Li

Full Text:PDF

GTID:2558306848458214

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the wide application of artificial intelligence algorithms in intelligent medical treatment,face recognition,autonomous driving and other fields,more and more researchers are committed to deploying neural networks in resource-constrained embedded scenarios.The biggest challenge of deploying neural networks in embedded devices is the high requirement of computing and storage capacity.Therefore,the deployment of some small deep neural networks on embedded devices has become a research hotspot in the industry.On the premise of meeting the performance requirements,the key of this paper is to design and implement an embedded object detection optimization system for lightweight convolutional neural network acceleration.The main research work of this paper is as follows:(1)Build the hardware and software implementation platform and deploy the Yolov2-TINY algorithm to the designed experimental platform for verification,and design the neural network accelerator,Zyn Q-based embedded system and its software applications.The experimental results are analyzed by serial tracking data collection,and a verification method is designed to analyze the system performance such as target detection accuracy,throughput rate and resource utilization rate.The feasibility and optimization effect of each component and the whole framework are verified by experimental analysis.(2)The memory application method of PL(Programming Logic)in embedded systems is studied,and the dynamic memory allocation method suitable for bare-metal PL is designed.This method has lightweight memory management ability,which can effectively reduce memory fragmentation and improve the stability of the system deployed on bare-metal machines.Design and optimize the neural network IP core scheduling flow in bare-metal system.(3)The fixed-point quantization technology is introduced to perform 8-bit fixed-point quantization on the weight and feature map of the neural network,which greatly reduces the size of the computing unit and model and reduces the resource consumption of the hardware while ensuring the accuracy loss within an acceptable range.The main convolution computing unit and maximum pooling computing unit in object detection are optimized and analyzed.In this paper,the deployment of Yolov2-TINY object detection algorithm is completed on Xilinx Zynq-XC7Z045 platform.Only 43.76% BRAM and 17.11% DSP resources in the chip are used,and the chip power consumption is only 5.534 W.The performance reaches 65.1GOP/s,and compared with similar advanced research,the throughput is increased by 2.14 times,which has considerable advantages.The heterogeneous target detection and processing system built by hardware/software collaboration technology has the characteristics of stable operation,small computation,fast running speed,easy transplantation and so on,which has a very large application prospect.

Keywords/Search Tags:

FPGA, Zynq, YOLOv2-Tiny, Neural Network Accelerator

PDF Full Text Request

Related items

1	Zynq-based Accelerator Design For Deep Convolutional Neural Networks
2	Zynq-based Convolutional Neural Network Embedded Acceleration System Design
3	Design And Application Of Convolutional Neural Network Accelerator Based On FPGA
4	Side Channel Attack Of Neural Network Accelerator Based On FPGA
5	Research And Implementation Of End-to-Side Inference Accelerator For Convolutional Neural Network Based On ZYNQ
6	Research On Neural Network Accelerator Customization Method For Large-scale Reconfigurable Hardware
7	Design And Application Research Of Convolutional Neural Network Accelerator Based On ZYNQ Platform
8	A Point Cloud Neural Network Accelerator Based On Zynq
9	Design And Optimization Of Tiny YOLO Convolutional Neural Network Accelerator
10	Research And Implementation Of YOLOv2 Network Based On FPGA