Design And Implementation Of VLIW Accelerator For Deep Learning Convolutional Neural Networks

Posted on:2017-01-25

Degree:Master

Type:Thesis

Country:China

Candidate:R B Shi

Full Text:PDF

GTID:2272330488961998

Subject:Electronic Science and Technology

Abstract/Summary:

PDF Full Text Request

The advantages of Convolutional Neural Networks(CNNs) with respect to traditional methods for visual pattern recognition have changed the field of machine vision. The main issue that hinders broad adoption of this technique is the massive computing workload in CNN that prevents real-time implementation on low-power embedded platforms. Recently, several dedicated solutions have been proposed to improve the energy efficiency and throughput by some top academic or commercial organizations. However, the huge amount of data transfer and access involved in the processing is still a challenging issue.The following work would be described in this thesis,First of all, the CNN benchmark layers have been collected and optimized. Then this thesis will give a Fine-Grained analysis of the parallelism possibilities for the convolutional layers. The concept of Intra Output Feature Map parallelism is proposed in this thesis. A novel general solution for CNN data storage is proposed as well.Secondly, a novel CNN hardware accelerator is presented. The VLIW Instruction-Set has been defined. Users can implement CNN layers with different parameters via simple VLIW programming.Thirdly, a Digital IC verification platform has been set based on the ZYNQ So C. The proposed accelerator architecture also has been implemented on this platform.Lastly, the accelerator has been implemented with 28 nm low-power library. Compared with the start-of-the-art, external memory access is reduced by 50% while achieving similar or better throughput. The accelerator achieves a performance of 102GOp/s @800MHz while consuming 0.303mm2 in silicon footprint. The maxim dynamic power of the accelerator is only 68 mW.

Keywords/Search Tags:

Convolutional Neural Networks, Parallel Computing, Accelerator Chip, Low Power, Deep Learning

PDF Full Text Request

Related items

1	Research Insulator Detection And Defect Recognition Based On Deep Convolutional Neural Network
2	Research On Large-scale Vehicle Image Retrieval Based On Convolutional Neural Network
3	The Analysis Of Non-normal Power Line Status Using Deep Learning Algorithms Based On Embedding Parallel Computing Architecture
4	Research On Remote Sensing Image Target Detection Based On Deep Neural Network
5	Fault Diagnosis Method Of Wind Turbine Variable Pitch System With Intelligent Optimization Parallel Filtering
6	Object Recognition And Learning Control Based On Deep Neural Networks For Intelligent Vehicles
7	Research On Deep Learning-Based Classification Of Smart Meter Users
8	Research On Learning Methods For Identifying Typical Underwater Target Images
9	Research On Lane Detection Based On Deep Learning
10	Research On The Application Of Deep Learning In The Detection Of Strain Clamps In Power Equipment