| In recent years,general-purpose processors can’t keep increasing its performance for emerging applications,under the limited power budget.Neural network’s recent development opens a new world for exploring new computing systems."General-purpose processor + neural network accelerator" based heterogeneous computing system,and neural network accelerator centric artificial intelligence(AI)computing system,are two promising solutions that enable higher performance and efficiency,for general-purpose computing and AI domain specific computing.Neural network accelerator design is quite important in the above two computing systems.This dissertation points out that neural network accelerator design needs improvements in computation pattern,computing architecture and memory optimization: Modeling the relationship between computation patterns and execution objectives is necessary,so the best computation pattern can be always selected when the objective or neural network changes;A dynamically reconfigurable computing architecture is required to adjust the computation pattern for each layer of a neural network,to optimize the execution objective;High-density memory is needed to deal with the memory bottleneck,while the overhead should alleviated to maximize the benefits of increased memory capacity.This dissertation proposes two design optimization principles for neural networks accelerators: Computation Pattern-Dynamic Reconfigurability based architecture design,and Device Characteristics-Error Resilience based memory optimization.Three works are completed under the guidance of the above two principles:(1)Neural network computing architecture RNA for general-purpose neural approximation: RNA takes minimizing computing latency as the objective.RNA can dynamically reconfigure its architecture to solve the mismatch problem between diverse network topologies and fixed hardware resources.RNA achieves accelerator speedup of 572 x and application speedup of 7.9x.(2)Neural network computing architecture DNA for AI applications: DNA takes maximizing throughput and energy efficiency as the objectives.DNA can dynamically reconfigure its architecture to realize the proposed hybrid computation pattern and parallel output oriented mapping method.DNA achieves resource utilization of93%,throughput improvement of 3.4x and 1~2 orders higher energy efficiency than state-of-the-art works.Thinker,a DNA-based AI chip has been fabricated in65 nm CMOS technology.(3)Retention-aware memory optimization framework RANA: RANA exploits neural network’s error resilience and short data lifetime to enhance hardware’s tolerance to less e DRAM refresh.Owing to RANA,a neural network accelerator can use e DRAM to improve its on-chip buffer capacity with almost no refresh overhead,which saves 41.7% off-chip memory access and 66.2% system energy consumption.This dissertation is highlighted by the three works and two design optimization principles.All the three works have been carefully evaluated to prove their practicability.Supported by the works,the two proposed principles have shown their value in guiding neural network accelerator design,and will also play a significant role in future neural network accelerator development. |