| In recent years,with the widespread application of artificial intelligence(AI)technologies represented by neural networks in fields such as biomedicine,autonomous driving,target detection,and edge intelligence computing,people’s quality of life,productivity,and understanding of the real world have seen a rapid improvement.However,the traditional computer hardware based on the von Neumann architecture is constrained by the "memory wall" problem,where the separation of storage and computing units leads to significant off-chip memory access becoming the main energy consumption and delay overhead of the system,unable to meet the increasing demand for "computing power" in AI applications.This has forced us to explore a new generation of computing paradigms to overcome current limitations.Among many new computing architectures,computing-in-memory(CIM)technology stands out.By integrating storage and computing units,it can minimize data movement and maximize computing bandwidth and energy efficiency,featuring characteristics such as low power consumption,high parallelism,and high throughput.In the future,the development of circuits and systems based on CIM technology holds promising prospects for accelerating AI computing,especially in resource-constrained edge intelligence computing,such as energy and time.However,current CIM circuits and systems still face many challenges and issues in terms of precision,area,delay,and energy efficiency:(1)Analog CIM chips are greatly affected by process,voltage,and temperature(PVT),and it has limited computing accuracy.(3)Modifications to storage units and interface circuit designs lead to excessive area overhead.(2)Energy and delay overhead caused by data.conversion interfaces and off-chip memory access have become the main factors limiting further improvement in system performance.This article focuses on optimizing the performance of current analog CIM systems from three aspects:"storage-computing circuits," "data conversion circuits,"and"hardware architecture,"for two types of memories,namely static random access memory(SRAM)and resistive random access memory(ReRAM).The innovative research results obtained are as follows:(1)For ReRAM,a Hybrid Precision and Symbolic Weight CIM(HPSW-CIM)architecture is proposed to achieve high-density,fully parallel,and energy-efficient multiply and accumulate computing(MAC)operations required by neural network algorithms.The sub-array structure of HPSW-CIM(HPSW-1T1R)implements global reference negative coefficient weights through constant coefficient circuits to reduce the storage space required for symbolic weights and improve computing density.Multiple HPSW-1T1R sub-arrays can further enhance array integration and reduce-the impact of peak bit-line current and voltage drop(IR-drop)effects through local integration and odd-channel.input,wight inversion encoding(OCIWI),and other forms.A segmented ramp-shared analog-to-digital converter(SRS-ADC)can achieve parallel linear conversion of analog signals with minimal circuit overhead,resulting in lower energy consumption and delay.Finally,performance evaluation of the HPSW-CIM architecture with the VGG-8 neural network algorithm shows that the proposed HPSW-CIM architecture can achieve an energy efficiency of 98.5 TOPs-1/W with 8-bit input,4-bit weights,and 8-bit output.(2)For SRAM,inspired by the working mechanism of the brain,a CIM circuit architecture without an analog-to-digital converter(ADC)is proposed,called "BSIC."This architecture replaces ADC with pulse time encoding neuron circuits as the readout circuit of the CIM core,ensuring low-power analog data readout while maintaining signal tolerance.BSIC integrates digital adder tree logic within the memory array to ensure parallel computing accuracy of large-scale arrays while enhancing the locality of partial sum data,reducing energy consumption and delay overhead caused by data movement.Data precision is flexibly configured.through two types of shift-add circuits,weighted accumulation and weighted summation,supporting symbolic computation.Post-simulation evaluation under the 0.18 μm process shows that the entire BSIC core consumes 370 ns for a single vector-matrix multiplication(VMM)operation with 4-bit input,4-bit weights,and 14-bit output,with an energy efficiency of 10.8-13.5.TOPs-1/W and a computational throughput.of 10.24 GOPs-1.(3)A charge-domain-based SRAM computing-in-memory method is proposed.By flexibly configuring switch capacitors,analog-to-digital converter(ADC)overhead for input data conversion,single-bit weight MAC operation,and weighted addition mechanism can be realized in the charge domain.The innovative approach completely pushes multi-bit MAC into the analog charge domain,avoiding the overhead of high-precision DACs and alleviating energy and area losses caused by ADC.Simulation and evaluation under a 28 nm process show that the proposed CIM macro unit with 1 Mb of memory can achieve an energy efficiency of 80.5 TOPs-1/W and a computational throughput of 1.6 TOPs-1 while meeting 8-bit VMM computation.(4)Efficient data conversion interface circuits for reconfigurable CIM systems are proposed.Two efficient interface circuits,the digital to time to analog converter(DTAC)and an innovative ramp-type joint quantized nonlinear ADC(JQNL-ADC),are proposed.Based on these two efficient interface circuits,a reconfigurable functional module for CIM(RFCIM)is proposed in further.Simulation evaluation shows that a CIM macro with 1 Mb of memory and 8-bit integer computation precision can achieve a computational efficiency of.112.9 TOPs-1/W and a computational throughput density of 17 TOPs-1/mm2.When implementing the long short-term memory(LSTM)algorithm for specific human activity recognition tasks,each inference only consumes 3.5 nJ of energy. |