Font Size: a A A

Research On Neural Network Parameter Compression And Inference Acceleration

Posted on:2021-05-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhouFull Text:PDF
GTID:2370330623965015Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,with the explosive research and development of deep neural networks,its powerful feature extraction and fitting capabilities have made it widely used in image recognition,natural language processing,speech recognition and other fields.In order to improve the performance of neural network models,researchers generally design deeper and more complex networks,which will greatly increase the amount of parameters and calculations of the model and requires more and more hardware resources(CPU,GPU memory,bandwidth),the cost becomes very expensive.At the same time,it is very difficult to deploy such a complex deep neural network directly on mobile devices with limited computing resource and endurance(such as mobile phones,drones,robots,and smart glasses).This paper solves this problem from the aspects of improving the compactness of the model and the efficiency of the calculation.The main contributes of this work are:1.Based on lightweight neural network MobileNet,the Tensor-Train tensor decomposition technology is used to compress the 1 × 1 convolution in the deep separable convolution.An adaptive Tensor-Train decomposition algorithm is proposed to solve the complex tuning problem of finding the optimal decomposition rank.For the Cifar-10 data set,the amount of parameters in the model proposed in this paper is only 20%-30% of MobileNet.2.Forward acceleration of the Tensor-Train decomposition algorithm on the GPU side is not obvious,this work uses the strategy of smaller decomposition dimensions and moderate rank decomposition based on the adaptive Tensor-Train decomposition to reduce the number of parameters.This work uses use dynamic programming algorithm to find the optimal calculation order of each layer of network after decomposition,which reduces the calculation amount of the model.3.Set up a real-time target detection network on mobile devices.Experiments show that compared to the SSD target detection network based on the native MoblieNet V2,the method in this paper accelerates the model inference speed up to about 1 time.On the Huawei Honor V10 mobile phone,the number of frames detected per second increased from 15 FPS to about 30 FPS.
Keywords/Search Tags:Tensor decomposition, Parameter compression, quantization, mobile target detection
PDF Full Text Request
Related items