Font Size: a A A

Research On Parallel Computing Architecture Of Multiple CNN Models On FPGA

Posted on:2021-03-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:S ZhangFull Text:PDF
GTID:1488306470467444Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Convolutional Neural Network(CNN)is an important branch of artificial neural network.CNN is a kind of machine learning method that has been widely deleveped with the concept of deep learning proposed in recent years.Different from the traditional rule-based feature extraction method,CNN is able to directly "learn" features of different specific targets from large-scale input images without human intervention.It has been widely used in image target detection,pattern recognition,machine vision,big data processing and other fields.With the development of Internet of Things and embedded system technology,a terminal with the ability of intelligent data processing and real-time decision-making becomes a trend,and the concept of intelligent edge computing emerges.The core problem in intelligent edge computing is to transfer the intelligent data processing function into the terminal,and the machine learning technology based on neural network is just the way to make embedded devices own the ability of intelligent data recognition and processing.However,the higher calculation complexity and parameter scales of neural network bring new challenges to embedded devices.FPGA,with the characteristics of high-density parallel computing ability and low power consumption,is suitable for embedded devices to be deployed in new intelligent applications.However,currently,the optimization work of CNN on FPGA mainly focuses on a single CNN model.In the future,with the increasing of resources integrated in FPGA,parallel execution of multiple CNN models on one system may become a trend.To achieve the requirements of multiple CNN models computed in one embedded system in the future,this dissertation studies the parallel calculation ability of FPGA resources,inlcuding the computing resources,logic resources and storage resources,and proposes a parallel computing method of multiple CNN models on FPGA.This study is started from the theory analysis of binary multiplication,and ended with an implementation method of multiple CNN systems with high performance and low power consumption.The details are as follows:(1)To solve the problem of low throughput of DSP in lower precision multiplication,a method of parallel DSP multiplier is proposed.Based on the theory of binary multiplication and polynomial algebra multiplication,this dissertation studies a parallel DSP multiplier with the precision reservation method based on the nonuniform input of a DSP slice.Then,based on the parallel DSP multiplier,the parallel computation of multiple low precision integer and semi-precision floating-point data is realized.In the parallel DSP multiplier solution,a parallel multiplier parameter searching algorithm is proposed to solve the problem of partial product overflowing;an optimization model of parallel multiplier parameters is proposed to apply the parallel method to different DSP multipliers.(2)To solve the problem of high DSP utilization and low computing accuracy of quantized floating-point CNN models on FPGA,a high performance parameter quantization method for multiple CNN parallel computing is proposed.Based on the parallel DSP multiplier,an 8-bits quantization method of half float-point data is proposed,which is able to implement the compution of double CNN in one IP without additional DSP multipliers.Moreover,a new exponent segmented normalization method for half floating-point data is proposed to implement faster data format conversion.Combining the two optimization methods,a double CNN parallel computing model is proposed,which can both improve the computing performance and recognition accuracy,meanwhile reduce DSP resource utilization.(3)to extend the bandwidth of DDR in a multi-channel CNN system,a high throughput data sharing strategy is propsed.A novel data broadcasting method for data sharing between mulitiple CNN IPs are proposed to reduce the data transfer time between DDR and multiple CNN IPs.The way of data broadcasting is to use the logic resources in FPGA to broadcast one channel data to multiple CNN IPs with only one DMA IP.In the implemetation of data broadcasting DMA IP,a configurable multiplex input switch IP,an asynchronous output switch IP in hardware system,and an IP scheduling algorithm in software driver are proposed.(4)Based on above contributions,a parallel computing architecture of multiple CNN models on FPGA is proposed.The design flow of the system includes parallel CNN models training,CNN parameters quantification,CNN IP packaging,and hardware and software system integration.Experimental results show that the parallel computing method of multiple CNN IPs with the data broadcasting DMA IP,is able to significantly reduce the system resources and power consumption,comparing with the traditional integration method with multiple exclusive DMA IPs,and the recognition accuracy of the CNN system on FPGA can be improved obviously with double CNN models recogning at the same time.Above research work includes multiplication optimization,parameter quantization and data sharing methods for multiple CNN systems on FPGA.The experimental results in this dissertation show that the above methods are effective and efficient,and they may be able to support future research work of multiple CNN systems on FPGA.
Keywords/Search Tags:Convolutional Neural Network, FPGA, parallel optimization, high performance embedded computing, system architecture
PDF Full Text Request
Related items