Research And Design Of Deep Neural Network Compression Algorithm Based On Processing-in-Memory Framework

Posted on:2024-03-19

Degree:Master

Type:Thesis

Country:China

Candidate:J H Wu

Full Text:PDF

GTID:2568306920451534

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Because of the shortcomings of the traditional cloud-centered computing mode,such as high-power consumption,long communication delay,and easy disclosure of personal privacy data,deep learning applications are gradually migrating from the cloud to the embedded device.However,deploying complex deep neural network models on edge devices whose hardware resources are limited will face problems such as insufficient storage space,low energy consumption efficiency,and long execution time.Processing-In-Memory(PIM)technology can significantly reduce the energy consumption and execution delay of hardware by eliminating the data movement between the storage units and computing units.Among them,the ReRAMbased Process-In-Memory technique has a natural advantage in performing deep neural network applications because its crossbar structure has the advantage of high parallelism and the ability to perform vector-matrix multiplication and addition operations in place.It can eliminate the high data transmission energy consumption caused by the massive data-moving operations,break the bandwidth bottleneck between the computing and memory units and further improve the system calculating capacity,and provide a reliable solution for deep learning applications to be more widely applied on the edge devices.However,due to the limited storage resources of ReRAM,deploying a large deep neural network model on it may face the hardware resource saturation problem.The storage and computing overhead of deep learning applications can be further reduced through the deep neural network compression algorithm.However,due to the tight coupling characteristic of the ReRAM crossbar architecture,performing deep neural network compression algorithms will lead to the misplacement multiplication problem between the input data and the weights of the deep neural network or the incorrect sum-of-product accumulation problem.In addition,the current deep neural network compression algorithms based on ReRAM PIM architecture usually only focus on one perspective compressing the deep neural network model.So,how to further reduce the storage and computing overhead caused by the redundant weights of the deep neural network model by coordinately using different deep neural network compression algorithms from various perspectives has become an urgent problem to be solved.To solve the above problems,this paper firstly designs a weight pattern reusing aware pruning algorithm,which can further reduce the storage and computing overhead of the deep neural network model from different perspectives by cooperatively utilizing the sparsity and repeatability of the weights in the deep neural network.It also can be well adapted to the underlying ReRAM-based processing-in-memory architecture.After that,this paper designs a ReRAM-based weight pattern reusable deep neural network accelerator for the proposed pruning algorithm.The small execution granularity of the ReRAM-based deep neural network accelerator can effectively utilize the sparsity and repeatability of the weights in the deep neural network and significantly reduce the calculation and storage overhead caused by the insensitive weights and repeated weights.Finally,the prototype design of the ReRAM-based deep neural network accelerator was implemented based on the MNSIM simulator.The proposed deep neural network compression algorithm is used on several classic deep neural network models,and a breakdown experiment is designed to verify the advancement of the proposed deep neural network compression algorithm.Experimental results show that the proposed deep neural network compression algorithm can reduce more storage and computing overhead caused by the redundant weights of the deep neural network model,and can achieve 1.64x performance improvement and 1.51x energy efficiency improvement compared with the state-of-the-art ReRAM-based DNN accelerators.

Keywords/Search Tags:

Processing-In-Memory, Deep Neural Network, Non-Volatile Memory, Hardware Acceleration

PDF Full Text Request

Related items

1	Non-volatile Memory Device Based Neural Network Accelerator Design
2	Research On Key Processing-in-memory Technologies With High-performance And Low-power For Deep Learning On Edge Devices
3	Neural Network Accelerator Based On ReRAM Non-volatile Memory
4	Research On Key Technologies Of Software And Hardware Optimizations Based On Emerging Non-volatile Memory
5	Efficient And Reconfigurable Deep Convolutional Neural Network Acceleration System With 3D Stacked Memory
6	Research On NVM Based Main Memory Key Technology
7	Research On Key Techniques Of Hybrid Memory Management For Big-Data Application
8	Research On Energy-Efficient Hybrid Main Memory Based On Non-Volatile Memory
9	Research On Key Technologies Of Non-volatile Memory System With High Performance And Security
10	The Design And Implement Of A Memory Allocator Based On Non-Volatile Memory(NVM)