Research And Implementation Of FPGA-Based Heterogeneous Accelerator For Convolutional Neural Networks

Posted on:2024-07-07

Degree:Master

Type:Thesis

Country:China

Candidate:P F Zhang

Full Text:PDF

GTID:2568307100480794

Subject:Electronic information

Abstract/Summary:

PDF Full Text Request

With the development of artificial intelligence and the increase in edge devices,the demand for deploying artificial intelligence algorithms at the edge is also growing.Deploying neural networks on processors based on homogeneous architecture is limited by power consumption and bandwidth,making it difficult to adapt to diverse neural network structures and different task requirements.However,processors based on heterogeneous architecture can perform a large number of parallel computations at low power consumption,which is well suited for deploying neural networks on edge devices.Therefore,this paper uses Xilinx’s Zed Board as a heterogeneous acceleration platform,and adopts Winograd fast convolution algorithm and dynamic parameter fixed-point quantization strategy to design a heterogeneous accelerator suitable for deploying convolutional neural networks at the edge.The main work of this paper is as follows:Firstly,this paper summarizes the research background,domestic and foreign development status,basic models,and calculation principles of convolutional neural networks（CNN）and their heterogeneous accelerators.In terms of the development platform and tools of heterogeneous accelerators,this paper selects HLS（High-Level Synthesis）high-level development tools and Zed Board heterogeneous development platform,combined with the computing characteristics of CNN and the performance advantages of heterogeneous platforms.In terms of the design of the heterogeneous accelerator,this paper targets the Le Net-5 model,analyzes the parallelism characteristics of the neural network forward propagation mechanism and its internal operations.Taking a typical accelerator architecture as an example,it explores the data transmission process of the heterogeneous accelerator,resource allocation of software and hardware,and efficient parallel optimization strategies,laying a theoretical foundation for the design of the heterogeneous accelerator architecture and embedded system development in later chapters.Secondly,this paper designs a CNN heterogeneous accelerator IP（intellectual property）core and an on-chip embedded system,which achieves excellent performance with low power consumption.The design idea of the heterogeneous accelerator IP is based on three aspects:dynamic parameter fixed-point quantization,fast convolution algorithm,and parallelism optimization strategy.Dynamic parameter fixed-point quantization dynamically adjusts the data quantization bit width by statistically analyzing the actual distribution range of the parameters,ensuring accuracy while reducing storage space and computation,and improving the stability and robustness of the CNN model.The fast convolution algorithm can effectively speed up convolution operations,reduce system delays,and this paper adopts the fast convolution algorithm based on Winograd transformation,greatly reducing the number of multiplications in convolution calculations and lowering the computational complexity.Parallelism optimization strategy combines the parallel computing characteristics of CNN and the parallel computing capability of FPGA,and designs parallel optimization schemes suitable for each module,such as pipelined loop unrolling,data flow optimization,and array partitioning for convolution,pooling,and full connection operations.In terms of the design of the on-chip embedded system,this paper uses the SDK development tool to develop the underlying driver of the CNN acceleration system,completes the complete mapping process from the CNN model to the ZYNQ accelerator,and combines the advantages of the heterogeneous platform to allocate software and hardware resources,reducing the computational pressure of the CPU and memory usage while shortening the inference time of convolutional neural networks.Finally,this paper takes Le Net-5 as the target algorithm and Zed Board as the design platform.The system is tested using the MNIST dataset,and the accelerator is evaluated comprehensively from four aspects:system accuracy,resource consumption,inference speed,and system power consumption.The experiment shows that the accelerator has a computing performance of 5.25 GOPS,completes one forward inference in only 5.14*10^-4s,and has a power consumption of 2.6W.It is 106 times more energy-efficient than the general-purpose CPU AMD5600g and 188 times more energy-efficient than the onboard ARM processor,demonstrating its excellent performance.

Keywords/Search Tags:

Edge computing, Heterogeneous accelerator, Convolutional neural network, Fast convolution algorithm

PDF Full Text Request

Related items

1	Heterogeneous Chiplet Neural Network Accelerator Design For Edge Computing
2	Research On Convolution Neural Network Accelerator For Edge Computing
3	Design And Implementation Of A High-performance Accelerator Dedicated For Convolutional Neural Networks
4	Design And Research Of Three Dimensional Interleaved Photon Convolution Accelerator For Color Image
5	Research And Design Of Convolutional Neural Network Accelerator Based On Heterogeneous SOC
6	Design And Optimization Of Convolution Array Accelerator Based On FPGA
7	Research On Computing-in-Memory Circuit And System For Edge Neural Network Accelerator
8	Research On Convolutional Neural Network Compression Strategy For Edge Computing
9	Research On Systolic Array Based Hardware Accelerator For Convolutional Neural Networks
10	Research On Heterogeneous Reconfigurable Dataflow Accelerator For Big Data Applications