A DNN Inference Acceleration Method For Resource-Constrained Process-in-Memory Chips

Posted on:2024-01-12

Degree:Master

Type:Thesis

Country:China

Candidate:X Gao

Full Text:PDF

GTID:2568306923470504

Subject:Network and information security

Abstract/Summary:

PDF Full Text Request

Conventional computer systems use a von Neumann architecture with separate processor and memory.Due to the limitation of communication bandwidth,data movement between the processor and memory becomes a performance bottleneck when running memory-intensive programs.Processing-in-Memory(PIM)-based accelerator can perform in-situ computation and eliminate data movement between memory and processor,which is an effective solution to the above performance bottleneck.Among various PIM devices,Resistive Random Access Memory(ReRAM),which has similar access performance to DARM and supports in-place matrix vector multiplication operations,has been widely explored in accelerating Deep Neural Network(DNN).Most existing studies of ReRAM-based DNN accelerator assume that all weights of a DNN can be programmed into the crossbar of ReRAM at once.However,given the current technology scaling,the ReRAM PIM chip capacity on an area constrained embedded or edge devices(typically dozens of Mbits)is substantially smaller than the weight size of current DNN models(e.g.,548MB for VGGNet).Therefore,it is impractical to deploy the entire weight of the network on ReRAM offline before inference computation.Instead,partial weights of the neural network need to be deployed one at a time to complete the inference of one input image by multiple deployments with non-negligible programming latency.Some recent works have considered the limitation of PIM resources and proposed solutions to reuse the weights already programmed onto the chip to batch process images.However,no work has yet discussed how to exploit weight similarity to reduce weight programming overhead.The goal of this paper is to design a programming latency-aware DNN inference framework for resource-constrained ReRAM devices.The framework statically plans and schedules the weight blocks of the neural network based on the device resource situation in order to reduce the weight programming latency in the online phase and thus optimize the overall inference latency.To achieve the above goal,the following challenges need to be addressed:1)Redesigning the mapping relationship from DNN weights to each operation unit(OU)of the ReRAM chip to seek the maximum per-bit reuse benefit;2)How to properly activate the statically planned weight blocks at runtime to achieve accurate and efficient inference of DNNs.To address the first challenge,we categorize the various writing characteristics of ReRAM crossbar discussed in the literature work,and conduct empirical studies to identify the potential impact on the DNN weight programming latency.And models the programming latency of ReRAM and designs a hierarchical optimization strategy for the proposed weight programming-aware framework.To address the second challenge,this paper customizes the corresponding OU scheduler to ensure the accurate and efficient operation of DNN inference.The proposed framework is evaluated with five standard DNN models and five natural language processing models.The evaluation results show that the static scheduling strategy proposed in this paper achieves significant speedups,with an overall latency optimization of up to 52.91%compared to state-of-the-art techniques.

Keywords/Search Tags:

PIM, ReRAM, resource-constrained, DNN accelerator

PDF Full Text Request

Related items

1	An ReRAM-based Accelerator For Sparse Graph Analytics Application
2	The Research Of Deformable Convolution Layer Acceleration Based On Classic CNN Accelerator
3	Neural Network Accelerator Based On ReRAM Non-volatile Memory
4	Study On The Failure Mechanism Of The Resistive Random Access Memory (ReRAM) In1T1R Architecture
5	The Research On Convolutional Neural Network Accelerator Based On In-memory Computing
6	Research And Implementation Of Resource Directory System In Constrained Network
7	Research On Resource Discovery And Location Technology For Constrained Applications
8	Research On ReRAM-based Hardware Security Technologies
9	Research And Implementation Of Key Technologies Supporting Resource-constrained Blockchain
10	Supporting multimedia applications in resource constrained multihop wireless networks