Font Size: a A A

Optimization And Application Of Deep Learning Network For Dedicated NPU

Posted on:2024-02-17Degree:MasterType:Thesis
Country:ChinaCandidate:Z C QiuFull Text:PDF
GTID:2568307079972369Subject:Electronic information
Abstract/Summary:PDF Full Text Request
With the development of computer intelligence and deep learning technologies,deep learning neural network models are increasingly deployed on edge devices rather than cloud servers.However,deploying deep learning network models on edge devices still faces a series of challenges and problems.This thesis focuses on the issues of deploying deep learning neural network models on a dedicated neural network processor(NPU)chip.The data and calculation volume of the deep learning neural network model is increasing,making it difficult to apply it on edge devices with limited memory and computing power.In response to this problem,on the basis of model quantification,this thesis compares the quantization methods of post-training quantization and quantization-aware training for the models to be quantized.Due to the low efficiency of the quantization model obtained by the quantization-aware training method,an improved selective quantization method based on sensitivity information is proposed.By analyzing the sensitivity information of the network layer,combined with the method of quantization-aware training,this method can obtain a model quantization scheme with better quantization quality and efficiency.The hardware architecture and software systems of edge devices are complex and diverse,and it is difficult to deploy deep learning neural network models to different edge devices through a unified and efficient solution.To solve this problem,this thesis designs a deep learning compilation and deployment method for dedicated NPU.Based on the idea of compilation optimization,this method converts deep learning models of various deep learning platforms in different formats into the intermediate representation of a unified neural network diagram,and abstracts the operators of neural network models at the software layer from the system level.The optimization of NPU is carried out on the level of neural network diagram and operator,so that the optimized model can meet the requirements of NPU chip memory,bandwidth,power consumption and other aspects.Finally,the code generation technology is used to generate the target machine language of various hardware platforms,which replaces the inefficient way of manually implementing the operators of the neural network model in the traditional deployment scheme,so that the model can be efficiently deployed and applied on the NPU chip with limited resources.Finally,this thesis designs and implements the corresponding deep learning network model optimization and deployment application system,and conducts verification and testing based on the NPU chip development board.The test results show that the system has reached the actual application standard in terms of function,deployment delay,and inference accuracy,successfully achieving the expected design goals.
Keywords/Search Tags:Deep Learning, Neural Network Processor, Model Quantization and Deployment, Selective layer-by-layer quantization, Deep Learning Compilation
PDF Full Text Request
Related items