Design And Implementation Of A High-performance Accelerator Dedicated For Convolutional Neural Networks

Posted on:2019-07-13

Degree:Master

Type:Thesis

Country:China

Candidate:X L Bao

Full Text:PDF

GTID:2348330545975154

Subject:Integrated circuit engineering

Abstract/Summary:

PDF Full Text Request

Artificial intelligence,as a hot topic in the field of computational science,gradually moves from research to industrial application.Convolutional neural network is a kind of feedforward neural network.In recent years,it has achieved remarkable results in the field of deep learning.It has been successfully applied to image and video recognition,natural language processing,and intelligent advertising push,and its error rate is much better than traditional methods.Deep learning,especially in the area of neural networks,has a huge amount of data and a complicated calculation process.It has a high requirement for the data bandwidth and computing power of the hardware platform,and the requirements for power consumption in the mobile portable field are especially stringent.These scenes limit its application effect.In this paper,a dedicated accelerator for convolutional layer convolutional neural networks is proposed.An internal dedicated data mapping method is designed to solve the problem of parallel computing between different feature map channels and different convolution kernels in the convolutional layer,and reduce the complexity of control implementation.For the different data sizes of the image input layer and the intermediate convolution layer,two corresponding working modes are designed.To balance storage bandwidth and computing speed,the convolutional layer is subdivided into finer-grained tasks and ping-pong execution to cover the transfer time from external storage to on-chip ram.Reusing the compute arrays in a reconfigurable manner can reduce the resources required and make full use of the hardware resources.The MAC(Multiply and Accumulate)unit can be reconstructed into four 64-channel multiply accumulate trees and 64 4-channel multiply accumulate trees to adapt to different convolutional layer data sizes.Taking the classic AlexNet’s five-layer convolutional layer as the test set,we run the behavioral simulation on the entire accelerator based on the realistic IO speed.Because of the ping-pong operation,the computing accounted for 87.8%of the entire operation,and the data transfer time accounted for 12.2%.Most of the transfer time was hidden by calculation.So the IO bandwidth can match the calculation speed of the entire system.The performance of the FPGA-based system can reach 57.7 GOPS at 160 MHz,and the average utilization of MAC unit is 70.5%.The design method of the CNN convolution accelerator in this paper is extensible and has good reference value and guiding meaning for other similar designs.

Keywords/Search Tags:

Artificial intelligence, Convolutional neural network, convolution accelerator, Parallel Computing, Pingpong Operation, reconfigurable computing array

PDF Full Text Request

Related items

1	Research On Computing-in-Memory Circuit And System For Edge Neural Network Accelerator
2	Research On Key Technologies Of Reconfigurable Neural Network Accelerator Design
3	The Implementation Of High Performance Hardware Accelerator
4	High-efficiency Reconfigurable Array Computing: Architecture, Methodology And Application Mapping Technology
5	Research And Design Of Convolutional Neural Network Accelerator Based On Multi-FPGA Co-acceleration
6	Design And Research Of FPGA Multi-threading Accelerator System For Convolutional Neural Network
7	Research And Implementation Of FPGA-Based Heterogeneous Accelerator For Convolutional Neural Networks
8	Research On Convolution Neural Network Accelerator For Edge Computing
9	Design And Research Of Three Dimensional Interleaved Photon Convolution Accelerator For Color Image
10	Parallel Accelerator Design For Convolutional Neural Networks Based On FPGA