| In the field of machine learning and cognitive science,neural networks are mathematical or computational models that mimic the structure and function of biological neural networks and are used to estimate or approximate functions.So far,there have been several types of deep neural networks,such as convolutional neural networks(CNN)and recurrent neural networks(RNN),which have been used in computer vision,natural language processing,speech recognition and bioinformatics,and have achieved very good results.In particular,convolutional neural networks can achieve unprecedented accuracy in tasks such as target recognition,detection,and scene understanding.From the AlexNet(8-layer network)proposed in 2012 to the ResNet(up to 152-layer network)proposed in 2015,the computational complexity of the neural network is continuously increasing,which is much higher than the traditional method and brings higher requirements to the computing hardware.In view of the problems of large computational complexity,high bandwidth requirements,and high energy consumption in neural network hardware computing under the current terminal application scenario,in order to further improve the energy efficiency of deep neural networks,improve throughput,and reduce power consumption,this paper is based on algorithm and structure.The design and implementation of hardware accelerator ASIC(application specific integrated circuit)for convolutional neural networks has been deeply analyzed and studied.On the basis of improving performance,by optimizing the circuit structure,controlling the area and power consumption of the circuit,the overall energy efficiency and other indicators can be improved.The specific research content of this paper is divided into the following aspects:(1)First,start with the neural network’s basic unit,the neuron,and use PCNN(Pulse Coupled Neural Network)as a digital implementation of the neural network.Study the neural unit hardware modeling based on the PCNN model and explore the composition and mechanism of the neural network.Aiming at the high performance and low power consumption of image processing embedded systems,a VLSI(Very Large Scale Integration)implementation based on two-stage PCNN algorithm for image segmentation is proposed.The first stage of the algorithm is based on a simplified PCNN model to obtain the seeds of the region,and the second stage seed expands the pixels with similar gray levels to achieve the growth of the region.In this process,PCNN parameters can be adaptively adjusted to overcome the limitations of parameter settings.In hardware implementation,the two-stage network is divided in the form of pipelines,using ping-pong storage technology,and register arrays are used to buffer the transmission of real-time image data.The experimental results show that the processing rate can reach a high throughput of 4.0×10~8 neuron iterations per second,which is 11%higher than other literatures.(2)Next,using the CNN algorithm as an entry point,the hardware accelerator ASIC design based on AlexNet convolutional neural network is studied.According to the computational characteristics of AlexNet,a 3×3 convolution operation unit,an on-chip buffer memory structure,an optimized parallel processing data stream,and an overall coarse-grained space architecture are designed to reduce the work by reducing access to data from off-chip DRAM.Consumption,improve overall energy efficiency.The 16 3×3 convolutional operation unit(PE)of this architecture achieves a peak performance of 144 GOPS at 500 MHz by utilizing local data reuse.The AlexNet convolution processing reaches 99.2 frames per second,and the power consumption is 264 mW at 500 MHz and 1.0 V.Compared with similar literature,this work achieves 3 times the energy efficiency and 3.5 times the area efficiency.(3)Based on the previous two parts,the commonality of other mainstream CNN neural network models such as VGG,GoogLeNet,and ResNet is summarized,and a more general-purpose and widely used hardware-accelerated processor ASIC circuit is designed.A high-performance coarse-grained space architecture with 243×3 convolution operation unit arrays is proposed.The data flow design of the data register set is used to realize the regular movement of values and transfer to the PE for calculation.For different operations or convolutions of different sizes,the command transmitting unit controls the modules to work together,which enhances flexibility and configurability.The main advantage of this architecture is that the internal area of each PE is optimized.The number of PEs facilitates improved computational efficiency when performing 3×3,5×5,and 7×7 convolutions,as well as reduced design of on-chip temporary memory cells and data streams.Redundancy of data storage in the buffer.Peak performance of 281 GOPS and power consumption of 859 mW were achieved at 650 MHz and 1.0 V.The following CNN convolutional layer throughput is:179 fps on AlexNet,76.6 fps on GoogLeNet,and 36.7 fps on ResNet-34.Compared with the AlexNet performance of similar literature,the proposed architecture achieves 1.7 times energy efficiency,1.7×to 4.5×area efficiency,and16.4%-23.7%increase in computational efficiency.The research in this paper and the hardware structure of the coarse-grained computing unit have important guiding significance for improving the throughput and computational efficiency of the convolutional neural network accelerator.Several neural network accelerator hardware circuits designed for design can achieve real-time processing performance for different application scenarios,and have important application value and broad application prospects. |