A Convolutional Neural Network Accelerator For Limited Hardware Computing Resources

Posted on:2021-01-20

Degree:Master

Type:Thesis

Country:China

Candidate:Y T Li

Full Text:PDF

GTID:2428330605981175

Subject:Electronic Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of deep learning,the network structure has become deeper and wider,and the amount of computing has begun to explode.The traditional general-purpose processor(CPU)uses the Von Neumann architecture.Although flexible,memory access becomes its bottleneck.The GPU uses a large number of arithmetic logic units(ALU).Although the computing efficiency is improved,the corresponding power consumption increases in proportion to the throughput.In the embedded field,limited power consumption,bandwidth,and on-chip resources are a big challenge.Therefore,building a processor dedicated to convolutional neural networks(CNN)through hardware has become a hot research direction.ASIC achieves high throughput and low power consumption through a high degree of customization,but the development cycle is longer and the cost of custom chips is higher.FPGA not only has the advantages of throughput and power consumption,but also has a short development cycle and reconfigurability.It is widely used in the design of convolutional neural network accelerators.We proposes a convolutional neural network acceleration architecture for limited hardware resources.While retaining the acceleration effect,it minimizes the bandwidth and on-chip resource requirements.Its main innovations are as follows:1.Mixed-pipeline design,which uses a pipeline structure in terms of macro and micro,and makes full use of the parallelism in time.2.Convolution size pipeline calculation architecture.The convolution calculation unit supports common convolution kernel sizes,and internally uses a 6-stage pipeline to compress the time to accumulate intermediate result,which can produce one result per clock.3.Highly multiplexed on-chip cache architecture.Each feature map pixel only needs to be loaded once,so the data can be highly multiplexed.There is no need to store complete feature maps on-chip,so on-chip resource requirements are low.4.Flexible and efficient software and hardware system architecture.Based on Zynq soc,the common parameters of the convolutional neural network can be flexibly configured through software,and then different network structures can be easily switched.We use PYNQ as an experimental platform to implement this architecture which integrate a Xilinx Zynq-7020 system-on-chip.The processor system(PS)and programmable logic(PL)are connected through the AXI bus to facilitate software and hardware collaboration.We have tested the architecture using the MNIST and Cifar-10 datasets,showing good bandwidth and on-chip resource requirements,low power consumption,and flexible software switching between the two networks.

Keywords/Search Tags:

Convolutional neural network, PYNQ, hardware acceleration, pipeline, paralle

PDF Full Text Request

Related items

1	Research On Neural Network Accelerator Based On PYNQ
2	Research And Implementation Of Convolutional Neural Network Accelerator Based On FPGA
3	Design And Research Of Convolutional Neural Network Accelerator Based On PYNQ Embedded Platform
4	Research And Implementation Of Image Classification And Recognition Technology Based On PYNQ
5	Design Of Convolutional Neural Network Acceleration System And FPGA Verification
6	Hardware Accelerated Design And Simulation Of Convolutional Neural Network
7	Implementation Of Low Precision Neural Network Based On PYNQ
8	Research On Hardware Acceleration Of 3D Convolutional Neural Network Algorithm Based On DSP
9	Research On Hardware Acceleration Based On FPGA Of Convolutional Neural Network And Elliptic Curve Algorithm
10	Research Of Acceleration Technology For Convolutional Neural Networks Based On FPGA