Font Size: a A A

Convolutional Neural Network Accelerator Based On Local Similarity Of Data

Posted on:2024-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y P CaiFull Text:PDF
GTID:2568306932455494Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,artificial neural network has become a hot research direction in the field of artificial intelligence.Thanks to the development of big data and high-performance computing hardware,artificial neural networks have made remarkable achievements in the fields of image recognition and object detection.However,as the scale of the network increases,the computational complexity of the neural network continues to increase,which puts forward higher requirements for hardware computing power.The current general-purpose computing architecture is increasingly difficult to meet the needs of computationally intensive tasks.Therefore,the research of neural network accelerator(NNA)has become a hot spot,especially the convolutional neural network accelerator has become one of the hot research directions.On the one hand,neural network accelerators design special data streams and optimize hardware architecture to improve computing speed and resource utilization;Increase the data utilization rate,thereby improving the computing efficiency.Therefore,under the background mentioned above,the main research content of this thesis is the convolutional neural network accelerator based on the local similarity of data.The work of this thesis mainly includes the following aspects:Design a zero-gradient approximate convolution algorithm for the spatial local similarity of data,use preprocessing to calculate the gradient sum,judge according to the gradient sum and skip the repeated operation of the spatial approximation data,and reuse the temporarily stored convolution results for reasonable approximation.The PASCAL VOC2012 dataset was used to evaluate the data reuse potential of AlexNet,ResNet,and YOLOv3 networks,and the degree of reusability between different networks and different layers and the impact of zero-gradient approximate convolution on its accuracy under different thresholds were obtained.On this basis,a dynamic threshold calculation strategy is designed,which associates the feature map data attributes with the threshold,and finds a relatively appropriate threshold in different networks and different feature maps,so as to avoid too much impact on accuracy or too little reuse.On the python software side,YOLOv3 is simulated under the threshold coefficient of 0.0001,which reduces the calculation amount by 25.5%and the accuracy loss is only 0.8%.A zero-gradient approximate convolution accelerator architecture is designed,including 16x16PE array,input and weight cache,output cache,output preprocessing,control module and gradient processing module.The gradient processing module is responsible for performing gradient processing on part of the feature maps read in,judging whether the convolution results are multiplexed,recording the number of channels,and allocating the input data of the PE array,skipping these channel data in the next sliding window sent.At the same time,the data flow and PE unit suitable for zero gradient approximate processing are designed to avoid the problem of unbalanced load.The input data is allocated in units of one input channel,and each row shares one input channel.Compared with the traditional data allocated by convolution window Streaming further improves efficiency.The PE unit uses two buffers,Psum and Ssum,to respectively cache the calculation results of a column and the results worthy of multiplexing and transmit them column by column,realizing data multiplexing.On the XCZU15EG FPGA platform,a target detection system based on the zero-gradient approximate convolution accelerator YOLOv3-tiny network was implemented,and its function simulation and performance analysis were carried out.Compared with the case where zero-gradient approximate processing is not added,the processing of pictures is reduced by 17.8%of cycles.Compared with GPU and other similar accelerators,our solution consumes less power and has high energy efficiency.
Keywords/Search Tags:data local similarity, convolutional neural network, zero gradient approximation, FPGA
PDF Full Text Request
Related items