| Convolutional neural network is a feed-forward neural network based on convolution operation,which has a wide range of applications in image recognition,audio recognition and other fields.With the continuous development of convolutional neural networks,their internal weight parameters and network depth are also increasing,which puts forward higher requirements on computing power.Traditional convolutional neural network accelerators always based on von Neumann architecture,more than 80% of the power consumption is consumed in the process of data handling.Therefore,academia and industry has gradually turned their attention to processing-in-memory convolutional neural network accelerators that are not based on von Neumann architecture.As the name implies,the processing-in-memory architecture physically integrates the storage module and the computing module,so that the storage has computing capabilities,thereby eliminating the time and energy consumption of moving data.The convolutional neural network accelerator proposed in this paper is based on the widely used memory-FLASH.Since FLASH is a non-volatile memory,the data can be retained even after the power is off,which is more convenient in application.In this paper,we proposed a processing-in-memory architecture based on SMIC40 nm 1Mb ML-FLASH,and carried out a hardware design.Due to the Processing-inMemory architecture chip does not form a complete industrial chain,there is no mature EDA tools to conduct comprehensive simulation and verification,so this paper proposes a modeling method for Processing-in-Memory architecture based on nonvolatile memory.Taking into account the process error,the number of activated units,the integral nonlinearity of the input and output modules,and the quantization error of the readout circuit,we modeled the proposed MLFLASH-CIM architecture.For the quantization of full-precision input/weight parameters to 4bit,we propose a 1/n top value quantization scheme and an adaptive amplification quantization scheme to improve the inference accuracy.This paper also built and trained multiple convolutional neural network models to verify the proposed convolutional neural network accelerator.When applied to the improved VGG-16 convolutional neural network,it can achieve 92.38% inference accuracy.For 4-bit multiply-accumulate(MAC)operations,the convolutional neural network accelerator based on FLASH storage and calculation integrated architecture proposed in this paper can achieve a peak throughput of 250 GOPS and an energy efficiency of 35.6 TOPS/W. |