Font Size: a A A

Design And Implementation Of VLIW Accelerator For Deep Learning Convolutional Neural Networks

Posted on:2017-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:R B ShiFull Text:PDF
GTID:2272330488961998Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
The advantages of Convolutional Neural Networks(CNNs) with respect to traditional methods for visual pattern recognition have changed the field of machine vision. The main issue that hinders broad adoption of this technique is the massive computing workload in CNN that prevents real-time implementation on low-power embedded platforms. Recently, several dedicated solutions have been proposed to improve the energy efficiency and throughput by some top academic or commercial organizations. However, the huge amount of data transfer and access involved in the processing is still a challenging issue.The following work would be described in this thesis,First of all, the CNN benchmark layers have been collected and optimized. Then this thesis will give a Fine-Grained analysis of the parallelism possibilities for the convolutional layers. The concept of Intra Output Feature Map parallelism is proposed in this thesis. A novel general solution for CNN data storage is proposed as well.Secondly, a novel CNN hardware accelerator is presented. The VLIW Instruction-Set has been defined. Users can implement CNN layers with different parameters via simple VLIW programming.Thirdly, a Digital IC verification platform has been set based on the ZYNQ So C. The proposed accelerator architecture also has been implemented on this platform.Lastly, the accelerator has been implemented with 28 nm low-power library. Compared with the start-of-the-art, external memory access is reduced by 50% while achieving similar or better throughput. The accelerator achieves a performance of 102GOp/s @800MHz while consuming 0.303mm2 in silicon footprint. The maxim dynamic power of the accelerator is only 68 mW.
Keywords/Search Tags:Convolutional Neural Networks, Parallel Computing, Accelerator Chip, Low Power, Deep Learning
PDF Full Text Request
Related items