Optimization And Application Of Deep Learning Network For Dedicated NPU

Posted on:2024-02-17

Degree:Master

Type:Thesis

Country:China

Candidate:Z C Qiu

Full Text:PDF

GTID:2568307079972369

Subject:Electronic information

Abstract/Summary:

PDF Full Text Request

With the development of computer intelligence and deep learning technologies,deep learning neural network models are increasingly deployed on edge devices rather than cloud servers.However,deploying deep learning network models on edge devices still faces a series of challenges and problems.This thesis focuses on the issues of deploying deep learning neural network models on a dedicated neural network processor(NPU)chip.The data and calculation volume of the deep learning neural network model is increasing,making it difficult to apply it on edge devices with limited memory and computing power.In response to this problem,on the basis of model quantification,this thesis compares the quantization methods of post-training quantization and quantization-aware training for the models to be quantized.Due to the low efficiency of the quantization model obtained by the quantization-aware training method,an improved selective quantization method based on sensitivity information is proposed.By analyzing the sensitivity information of the network layer,combined with the method of quantization-aware training,this method can obtain a model quantization scheme with better quantization quality and efficiency.The hardware architecture and software systems of edge devices are complex and diverse,and it is difficult to deploy deep learning neural network models to different edge devices through a unified and efficient solution.To solve this problem,this thesis designs a deep learning compilation and deployment method for dedicated NPU.Based on the idea of compilation optimization,this method converts deep learning models of various deep learning platforms in different formats into the intermediate representation of a unified neural network diagram,and abstracts the operators of neural network models at the software layer from the system level.The optimization of NPU is carried out on the level of neural network diagram and operator,so that the optimized model can meet the requirements of NPU chip memory,bandwidth,power consumption and other aspects.Finally,the code generation technology is used to generate the target machine language of various hardware platforms,which replaces the inefficient way of manually implementing the operators of the neural network model in the traditional deployment scheme,so that the model can be efficiently deployed and applied on the NPU chip with limited resources.Finally,this thesis designs and implements the corresponding deep learning network model optimization and deployment application system,and conducts verification and testing based on the NPU chip development board.The test results show that the system has reached the actual application standard in terms of function,deployment delay,and inference accuracy,successfully achieving the expected design goals.

Keywords/Search Tags:

Deep Learning, Neural Network Processor, Model Quantization and Deployment, Selective layer-by-layer quantization, Deep Learning Compilation

PDF Full Text Request

Related items

1	Research On Binary Quantization Methods Of Deep Learning Models
2	Research On Deep Learning Model Quantization And Related Compression Technologies
3	Research On Layer-by-layer Adaptive Communication Optimization Method For Distributed Deep Learning
4	Research On Binary Quantization Method Of Deep Neural Network
5	Study Of Low Bit-width Quantization Of Deep Convolutional Neural Network
6	Research On Key Technologies Of Wireless Communication Physical Layer Based On Deep Learning
7	Research Of Fined-grained Layer-wise Parallelism Strategy For Deep Learning Model On Many-core Platform
8	End-to-End Model-Driven Deep Neural Network For Physical Layer Transceiver Design
9	Study On Quantization Methods Of Physical Layer Secret Key Generation
10	Quantization Algorithm Of Deep Neural Network And Its FPGA Implementation