| With the rapid development of machine learning fields such as computer vision,natural language processing and statistical learning,bottlenecks in the von Neumann architecture used in modern computer systems are beginning to emerge.On the one hand,the gap between the access speed and operating frequency of the processor and the memory is getting bigger and bigger,while on the other hand data mining and data analysis by various algorithms is leading to an increasing amount of data every day.The design to break the bottleneck of the von Neumann architecture is becoming increasingly important and in-store computing is proposed as an emerging architecture.Static Random-Access Memory(SRAM),as a cache,has become a major research area in in-store computing because of its high speed,no need to refresh data,low power consumption and high compatibility.Convolutional Neural Networks(CNN)are essential for machine learning and have been widely used in the fields of computer vision and speech recognition.In this paper,we combine SRAM in-store computing technology with CNN to achieve the following functions:(1)build a new multiplication architecture based on traditional 6T SRAM cells to achieve multiply-accumulate operations with symbolic bits;(2)propose a capacitor sharing module to achieve multiply-accumulate and maximum pooling functions.(3)introduce a current mirror negative feedback circuit to increase the swing of bit line voltage and improve the linearity of computation;(4)The data scheduling method is designed,and only the movement of convolution kernel data is changed to complete convolution operation.Through the above design,this paper implements the convolution operation in memory,which has the largest amount of data and the most frequent data movement in CNN,greatly reducing the data migration between the processor and memory and improving the computational speed and energy efficiency.The circuit simulation results show that the proposed multibit multiply-accumulate circuit with symbolic bits for CNN applications reduces the integration nonlinearity by about74% and the bitline voltage fluctuation by 75.4% compared to conventional in-store computation circuits,and the benchmark column suppresses the problem of bitline voltage fluctuation with temperature in conventional in-store computation.The proposed circuit can achieve a throughput of 12.8 GOPS and an energy efficiency of 34.85 TOPS/W with a 3-bit convolution input,3-bit weights and a 5-bit multiply-accumulate output.The combined data show that the multi-bit multiply-accumulate circuit with signed bits proposed in this paper for CNN applications has the advantages of high linearity,high stability and high efficiency. |