Font Size: a A A

A Convolutional Neural Networks Accelerator Based On Parallel Memory Technology

Posted on:2018-11-19Degree:MasterType:Thesis
Country:ChinaCandidate:H B TanFull Text:PDF
GTID:2428330623450685Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,deep learning algorithms based on Convolution Neural Networks(CNNs)have been apply successfully in the field of computer vision and image recogni-tion,which changes the traditional machine vision framework a lot,and it has became one of the kernel algorithms to implement artificial intelligence.CNN algorithms are not complicate and highly parallelizable,but the arithmetic data is massive.Especially,with the development of deep learning applications,the more enormous network scale have resulted a surge in computation.The traditional software acceleration methods can not meet the requirements for performance,power consumption and real-time performance of CNN algorithms,studying the hardware acceleration and optimization schemes has great significance.Based on parallel memory technology,this paper studies the hardware acceleration for the kernel algorithm of CNN—convolution layer algorithm and designs a dedicated accelerator.The main work is as follows.Firstly,this paper analyzes the parallelism and data reusability of convolutional lay-er algorithm,and the parallel mechanism between graphs and intra-graph in accelerator is confirmed.Theoretical models are used in analyzing the relationship between buffer capacity and off-chip bandwidth,which provides guidance for On-chip parallel memo-ry design.Giving the overall design of the CNN accelerator.Designing a convolutional kernel buffer and a output buffer for the special processing element(PE),it makes the cal-culation and data fetching more streamline and ensure the performance of PE.This paper proposes a two dimensional buffer which can support block access with different steps,it meets the needs of parallel access of the accelerator and improves the data reusability and the MAC utilization.By using AXI bus protocol,this paper designs a standard us-er interface,and providing library functions for the user to invoke,which increases the accelerator's versatility.Secondly,the RTL design of CNN accelerator is completed,and a testbench platform based on Matlab excitation is built to verify the functions of CNN accelerator.The code has been verified by tens of thousands of test stimulus,and the function is correct.The design is synthesized and optimized based on 40nm process.The results show that the accelerator achieves a 1GHz working frequency with a total area of 4.51mm~2,an On-chip buffer of 192KB and power consumption of 985mW.The performance of the accelerator is tested by the real applications and the main-stream CNN algorithm models.The results show that the memory access delay can be completely hidden during the calculation process in most cases.The efficiency of PE is over 90%and the computing performance is over 100 GMACs.Compared with other researches,the design has less hardware overhead under the same performance.
Keywords/Search Tags:Convolutional layer, Accelerator, Ping-pang buffer, 2D buffer, Processing Element, Access of block with steps
PDF Full Text Request
Related items