A Convolutional Neural Networks Accelerator Based On Parallel Memory Technology

Posted on:2018-11-19

Degree:Master

Type:Thesis

Country:China

Candidate:H B Tan

Full Text:PDF

GTID:2428330623450685

Subject:Electronic Science and Technology

Abstract/Summary:

PDF Full Text Request

In recent years,deep learning algorithms based on Convolution Neural Networks(CNNs)have been apply successfully in the field of computer vision and image recogni-tion,which changes the traditional machine vision framework a lot,and it has became one of the kernel algorithms to implement artificial intelligence.CNN algorithms are not complicate and highly parallelizable,but the arithmetic data is massive.Especially,with the development of deep learning applications,the more enormous network scale have resulted a surge in computation.The traditional software acceleration methods can not meet the requirements for performance,power consumption and real-time performance of CNN algorithms,studying the hardware acceleration and optimization schemes has great significance.Based on parallel memory technology,this paper studies the hardware acceleration for the kernel algorithm of CNN—convolution layer algorithm and designs a dedicated accelerator.The main work is as follows.Firstly,this paper analyzes the parallelism and data reusability of convolutional lay-er algorithm,and the parallel mechanism between graphs and intra-graph in accelerator is confirmed.Theoretical models are used in analyzing the relationship between buffer capacity and off-chip bandwidth,which provides guidance for On-chip parallel memo-ry design.Giving the overall design of the CNN accelerator.Designing a convolutional kernel buffer and a output buffer for the special processing element(PE),it makes the cal-culation and data fetching more streamline and ensure the performance of PE.This paper proposes a two dimensional buffer which can support block access with different steps,it meets the needs of parallel access of the accelerator and improves the data reusability and the MAC utilization.By using AXI bus protocol,this paper designs a standard us-er interface,and providing library functions for the user to invoke,which increases the accelerator's versatility.Secondly,the RTL design of CNN accelerator is completed,and a testbench platform based on Matlab excitation is built to verify the functions of CNN accelerator.The code has been verified by tens of thousands of test stimulus,and the function is correct.The design is synthesized and optimized based on 40nm process.The results show that the accelerator achieves a 1GHz working frequency with a total area of 4.51mm~2,an On-chip buffer of 192KB and power consumption of 985mW.The performance of the accelerator is tested by the real applications and the main-stream CNN algorithm models.The results show that the memory access delay can be completely hidden during the calculation process in most cases.The efficiency of PE is over 90%and the computing performance is over 100 GMACs.Compared with other researches,the design has less hardware overhead under the same performance.

Keywords/Search Tags:

Convolutional layer, Accelerator, Ping-pang buffer, 2D buffer, Processing Element, Access of block with steps

PDF Full Text Request

Related items

1	Design Of FPGA Convolution Neural Network Accelerator Based On HLS
2	Design And Simulation Of 4H-SIC MESFETs Structures In View Of The Improved Buffer Layer
3	On The Design And Performance Analysis Of Transmission Strategies In Buffer-aided Wireless Cooperative Networks
4	Research Of A 7DOF Ping-pang Manipulator's Motion Planning And Control
5	Effect Of Buffer Layer Based On Green Organic Photodiode On Device Characteristics
6	Study On Modulating Performances Of CuPc Based Memory Device By Using LiF As The Buffer Layer
7	Research Of The Mechanism Of Buffer Layer Crystalline On The Properties Of ZnO Films
8	Research On The Key Technology Of Router Buffer For NoC
9	Research On Deposition ZnS Buffer Layer By Chemical Solution Methods
10	Study On Leakage Mechanism Of GaN Buffer Layer In GaN-based Heterostructures