Font Size: a A A

Research And Design Of Deep Neural Network Compression Algorithm Based On Processing-in-Memory Framework

Posted on:2024-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:J H WuFull Text:PDF
GTID:2568306920451534Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Because of the shortcomings of the traditional cloud-centered computing mode,such as high-power consumption,long communication delay,and easy disclosure of personal privacy data,deep learning applications are gradually migrating from the cloud to the embedded device.However,deploying complex deep neural network models on edge devices whose hardware resources are limited will face problems such as insufficient storage space,low energy consumption efficiency,and long execution time.Processing-In-Memory(PIM)technology can significantly reduce the energy consumption and execution delay of hardware by eliminating the data movement between the storage units and computing units.Among them,the ReRAMbased Process-In-Memory technique has a natural advantage in performing deep neural network applications because its crossbar structure has the advantage of high parallelism and the ability to perform vector-matrix multiplication and addition operations in place.It can eliminate the high data transmission energy consumption caused by the massive data-moving operations,break the bandwidth bottleneck between the computing and memory units and further improve the system calculating capacity,and provide a reliable solution for deep learning applications to be more widely applied on the edge devices.However,due to the limited storage resources of ReRAM,deploying a large deep neural network model on it may face the hardware resource saturation problem.The storage and computing overhead of deep learning applications can be further reduced through the deep neural network compression algorithm.However,due to the tight coupling characteristic of the ReRAM crossbar architecture,performing deep neural network compression algorithms will lead to the misplacement multiplication problem between the input data and the weights of the deep neural network or the incorrect sum-of-product accumulation problem.In addition,the current deep neural network compression algorithms based on ReRAM PIM architecture usually only focus on one perspective compressing the deep neural network model.So,how to further reduce the storage and computing overhead caused by the redundant weights of the deep neural network model by coordinately using different deep neural network compression algorithms from various perspectives has become an urgent problem to be solved.To solve the above problems,this paper firstly designs a weight pattern reusing aware pruning algorithm,which can further reduce the storage and computing overhead of the deep neural network model from different perspectives by cooperatively utilizing the sparsity and repeatability of the weights in the deep neural network.It also can be well adapted to the underlying ReRAM-based processing-in-memory architecture.After that,this paper designs a ReRAM-based weight pattern reusable deep neural network accelerator for the proposed pruning algorithm.The small execution granularity of the ReRAM-based deep neural network accelerator can effectively utilize the sparsity and repeatability of the weights in the deep neural network and significantly reduce the calculation and storage overhead caused by the insensitive weights and repeated weights.Finally,the prototype design of the ReRAM-based deep neural network accelerator was implemented based on the MNSIM simulator.The proposed deep neural network compression algorithm is used on several classic deep neural network models,and a breakdown experiment is designed to verify the advancement of the proposed deep neural network compression algorithm.Experimental results show that the proposed deep neural network compression algorithm can reduce more storage and computing overhead caused by the redundant weights of the deep neural network model,and can achieve 1.64x performance improvement and 1.51x energy efficiency improvement compared with the state-of-the-art ReRAM-based DNN accelerators.
Keywords/Search Tags:Processing-In-Memory, Deep Neural Network, Non-Volatile Memory, Hardware Acceleration
PDF Full Text Request
Related items