Font Size: a A A

Research And Design For High Performance Cnn Hardware Accelerator

Posted on:2020-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:J Q LiFull Text:PDF
GTID:2428330590974080Subject:Microelectronics and Solid State Electronics
Abstract/Summary:PDF Full Text Request
The 21st century has stepped into an intelligent world,and the word"smart"has permeated every aspect of life.Intelligent appliances ranging from most ordinary smart phones and smart homes to the recently popular autonomous vehicles represents the significance of smart applications in daily life.However,intelligent on the terminal application,hardware also demands to be intelligent therefore instead of relying on powerful cloud services,the reasoning process can be completed on the terminal.With the development of deep convolutional neural network,it can be able to achieve higher accuracy in many aspects,i.e.computer vision,speech and natural language processing,and terminal devices based on CNNs has as high market value.Therefore,it is of great significance to develop a smart chip specially oriented to convolutional neural network.The efficient computing performances of convolutional neural network at the hardware level requires to overcome the problem of huge convolution computations.At the same time,CNNs models mostly adopt convolution cores of different sizes,so the designed accelerator architecture should support multiple convolution operations.We have designed a high performance reconfigurable accelerator mainly for large size convolutional neural network.The designed accelerator is compatible with convolution computation of multiple sizes.Its overall structure is composed of data cache region,register group,result cache region and core PE array region.This high-performance accelerator optimizes classic CNN models,such as AlexNet and GoogLeNet,and supports convolution operations of 1×1/3×3/5×5/7×7/11×11.The architecture design includes 4 channels,a combination of 24 parallel PE for each channel,and 9 multipliers structured the PE units,which can make the structure compact and reduce the area.In order to cooperate with the high-speed calculation of PE unit,this structure accesses data from SRAM as far as possible,and designed corresponding register group can,pretreating data,reduces data redundancy and corresponding power consumption.The architecture is integrated with TSMC 65 nm CMOS technology,which can reach 500 MHz core clock frequency and achieve peak performance of 864 GOPS.The power consumption under 1 V working voltage is 930.9 mW and core area is5.88mm~2.Throughput of 575.9 FPS and 236.4 FPS were achieved on AlexNet and GoogLeNet convolution layer respectively,average performance reached 766.8GOPS and 747.8 GOPS,and energy efficiency reached 823.7 GOPS/W and 803.3GOPS/W,respectively.The proposed architecture achieves 2.47 times energy efficiency,1.9 times to 5.4 times area efficiency,and higher computing efficiency of26.95%to 27.35%of the AlexNet benchmark reference,compared with similar designs internationally.The hardware structure of accelerator proposed in this thesis has great significance to improve the throughput and computational efficiency of the convolutional neural network accelerator.Our designed convolutional neural network accelerator can achieve the performance of real-time processing and is suitable for numerous terminal products.Therefore,it has high research value.
Keywords/Search Tags:convolutional neural network, deep learning, googlenet, asic, high-performance reconfigurable accelerator
PDF Full Text Request
Related items