Acceleration And Optimization Of Deep Learning Algorithm Based On Embedded GPU Platform

Posted on:2020-06-27

Degree:Master

Type:Thesis

Country:China

Candidate:X Yin

Full Text:PDF

GTID:2428330623463694

Subject:Electronics and Communications Engineering

Abstract/Summary:

PDF Full Text Request

In recent years,with the rapid improvement of computing power,the deep learning network has been rapidly developed,and has been widely used in speech recognition,computer vision and natural language processing or other research.In order to extract more effective features,the number of layers of the deep learning network grows faster,and has the characteristics of large size and many parameters,so it requires high-performance GPU and other devices to provide computing power support.On the other hand,with the rapid development of embedded or mobile devices such as drone,robot and smartphone,the need to deploy deep learning networks on these devices has become more intense.However,resources on these real-time application platforms(such as storage,computing and battery power)are very limited.Therefore,accelerating and optimizing deep learning networks in resource-constrained platforms has become a research topic in academia and industry.Based on this,more and more network acceleration and optimization algorithms are proposed,but these algorithms are based on the image classification task,and the deep learning model rarely combine with the object detection task,or use multiple compression methods in the same task,which is also the focus of this paper.This paper first introduces the research background and development trend of deep learning network acceleration and optimization,and comprehensively summarizes and expounds the current main model compression algorithms.Then,for the target device embedded GPU platform Jetson TX2,the following two tasks are completed:First,for the classic object detection task PASCAL VOC,we compare the results of the Roofline model on popular object detection algorithm based on deep learning,which are also actual deployed and verificated on the Jetson TX2.Considering the accuracy,efficiency and model size of different algorithms,We finally redesigned an efficient object detection network using depthwise separable convolution method.Compared to YOLOv2,it reduced the accuracy of the model by 5%,but the detection speed increased by 150%,and the size of the model was also compressed by 80%.Then,based on the new detection network,the filter-level pruning method is used to further compress and accelerate the model.And the detection speed of the network is increased by 20%,and the storage space is reduced by about 55%.Second,we use S~3FD as the basic detection algorithm and optimize it based on the micro face detection under the real surveillance camera.One aspect of optimization is based on CUDA to add S~3FD missing network layer and optimizing its calculation graph,so that the per frame detection time on the Jetson TX2platform is increased from 0.69s to 0.45s.On the other hand,the INT8 and FP16 data types are used for network quantization acceleration.After a series of operations such as calibration,the per frame detection time can be increased to 0.27s(FP16)and 0.14s(INT8).Finally,based on the optimized detection network,a tiny face automatic detection system is built,realizing the real-time effect on the PC side(GTX 1080Ti).

Keywords/Search Tags:

Deep Learning, Object Detection, Embedded GPU, Network Compression

PDF Full Text Request

Related items

1	Acceleration And Optimization Of Deep Learning Algorithm Based On Embedded GPU Platform
2	Deep Neural Network Compression Algorithm And Its Application In Object Detection
3	Research And Application Of Object Tracking Algorithms Based On Embedded System
4	Research And Implementation Of Deep-Learning-Based Object Detection Optimizing Technology
5	An Improving Object Detection Algorithm Based On Deep Convolutional Neural Networks
6	Research And Implementation Of Object Real-time Detection Model Based On Deep Learning
7	Research And Application Of Lightweight Object Detection Based On Deep Learning
8	Model Compression And Forward Acceleration Based On Embedded Deep Neural Network
9	Research On Compression Technology Of Object Detection Model Based On Deep Learning
10	Learning-Based Object Detection And Applications