Font Size: a A A

Design Of Heterogeneous Neural Network Accelerator Based On Pruning And Sparsity Optimization

Posted on:2024-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:S T LiFull Text:PDF
GTID:2568307106968659Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of deep neural network,artificial intelligence technology has defeated traditional technology in many fields,and now the artificial intelligence technology based on deep neural network is beginning to develop to the terminal.However,limited by the high cost of model training and the high computing power requirements of deep learning neural network models,how to efficiently apply artificial intelligence technology to resource-constrained terminal fields has become one of the current hot issues.This paper is oriented to the application field of FPGA-based terminal equipment,and studies how to efficiently deploy a huge deep neural network model in an environment where terminal resources are scarce.Based on the design idea of combining software and hardware,this paper mainly trains and compresses the model on the software side,reduces the parameter amount of the model to accelerate,and mainly accelerates the inference calculation of the model on the hardware side.On the software side,this paper proposes a structured pruning compression training algorithm FLRPM based on channel and normalization constraints.Under the condition that the accuracy remains unchanged,in different models,the maximum number of channels of the model is compressed by 65%,and the minimum number of channels is compressed by 25%.The FLRPM algorithm can flexibly compress the model according to the compression ratio of different resources.At the same time,in the case of extreme compression ratio(below 50%),the FLRPM method can stably compress the model structure without causing a cliff-like accuracy drop.On the FPGA side,this paper proposes an SCFA acceleration framework that combines convolution acceleration and sparsity acceleration.The performance of neural network accelerators is further improved with reduced model accuracy.Through the double calculation strategy,the SCFA acceleration framework achieves a maximum speedup of 2.2 times and reduces energy consumption by 70% in different model structures.At the same time,the SCFA acceleration framework greatly reduces the resource bottlenecks of terminal model applications,including block random access memory(BRAM),look-up table(LUT),bandwidth(Bandwidth)and digital signal processing unit(DSP).It has greatly promoted the actual deployment of the deep neural network model in the terminal.
Keywords/Search Tags:Deep Neural Networks, Model Pruning, Sparsity Acceleration, Convolution Acceleration
PDF Full Text Request
Related items