Embedded FLASH Based Processing-in-memory Architecture For Convolutional Neural Network

Posted on:2022-07-13

Degree:Master

Type:Thesis

Country:China

Candidate:S T Zeng

Full Text:PDF

GTID:2518306524487074

Subject:Master of Engineering

Abstract/Summary:

PDF Full Text Request

Convolutional neural network is a feed-forward neural network based on convolution operation,which has a wide range of applications in image recognition,audio recognition and other fields.With the continuous development of convolutional neural networks,their internal weight parameters and network depth are also increasing,which puts forward higher requirements on computing power.Traditional convolutional neural network accelerators always based on von Neumann architecture,more than 80% of the power consumption is consumed in the process of data handling.Therefore,academia and industry has gradually turned their attention to processing-in-memory convolutional neural network accelerators that are not based on von Neumann architecture.As the name implies,the processing-in-memory architecture physically integrates the storage module and the computing module,so that the storage has computing capabilities,thereby eliminating the time and energy consumption of moving data.The convolutional neural network accelerator proposed in this paper is based on the widely used memory-FLASH.Since FLASH is a non-volatile memory,the data can be retained even after the power is off,which is more convenient in application.In this paper,we proposed a processing-in-memory architecture based on SMIC40 nm 1Mb ML-FLASH,and carried out a hardware design.Due to the Processing-inMemory architecture chip does not form a complete industrial chain,there is no mature EDA tools to conduct comprehensive simulation and verification,so this paper proposes a modeling method for Processing-in-Memory architecture based on nonvolatile memory.Taking into account the process error,the number of activated units,the integral nonlinearity of the input and output modules,and the quantization error of the readout circuit,we modeled the proposed MLFLASH-CIM architecture.For the quantization of full-precision input/weight parameters to 4bit,we propose a 1/n top value quantization scheme and an adaptive amplification quantization scheme to improve the inference accuracy.This paper also built and trained multiple convolutional neural network models to verify the proposed convolutional neural network accelerator.When applied to the improved VGG-16 convolutional neural network,it can achieve 92.38% inference accuracy.For 4-bit multiply-accumulate(MAC)operations,the convolutional neural network accelerator based on FLASH storage and calculation integrated architecture proposed in this paper can achieve a peak throughput of 250 GOPS and an energy efficiency of 35.6 TOPS/W.

Keywords/Search Tags:

convolutional neural network, processing-in-memory, Nor-FLASH, NPU

PDF Full Text Request

Related items

1	Optimal And Design Of Convolutional Neural Network Based On Processing In Memory
2	Quantification And Deployment Of Convolutional Neural Network Based On NOR Flash
3	An Efficient Processing In Memory Framework For Convolutional Neural Networks Using The Second Generation Racetrack Memory
4	Research On Flash Memory Channel Estimation And Detection Technology Assisted By Deep Learning
5	Design And Analysis Threshold Voltage Detection Algorithm For MLC NAND Flash Memory
6	Research On Channel Parameter Estimation And Neural Network Assisted Error Correction Algorithm For Multi-Level Flash Memory
7	Research And Design Of Computing-in-memory Circuit Based On Variable Gain Amplifier
8	Research And Design Of Computing-in-memory
9	Research On QC-LDPC Decoding For NAND Flash Memory
10	Research And Design Of Memory Computing Circuit For Convolutional Neural Network Based On RRAM