Font Size: a A A

Research On Design Patterns And Applications Of High-Performance Algorithms Based On FPGA

Posted on:2021-03-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:D L LiFull Text:PDF
GTID:1368330623477245Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,the amount of data to be calculated has increased dramatically with the development of research and application in big data,cloud computing,artificial intelligence and other related fields.The demand for computation power of various computation intensive applications,such as database,intelligent algorithm,deep learning,online prediction and driverless,is far beyond the capacity of tratiditional General-Purpose Processors,i.e.,CPUs.Since the 1960 s,researchers have proposed parallel computating to accelerate algorithms for higher computation performance and efficiency of solving problems.The computation power of parallel computating system improves along with the continuous iteration and enhancement of CPU processing capability and related technology.However,in recent years,semiconductor technology almost reaches the physical limitation.Moore’s law almost fails.On the other side,the demand for computation power is still increasing with the growth of data.In the 21 st century,heterogeneous computation gradually becomes an effective way for the improvement of computation power.FPGAs are programmable chips,and directly translate algorithm logic into combination of transistor circuits,without instructions and softwares.This makes FPGAs perform better than General-Purpose Processors in computation speed,delay and power consumption.FPGAs play important roles in many application scenarios and become research hotspot in the field of heterogeneous computation.However,they also face challenges in algorithm design methods based on FPGAs.(1)The algorithm design based on FPGAs is circuit oriented,which requires deep understanding of FPGAs architecture and digital circuits.At present,most of the algorithm developers are software developers,lack of understanding of hardware.The algorithm design tool chains based on FPGAs are not perfect,resulting in low actual development efficiency on FPGAs platforms.(2)Most of the existing algorithm design and performance optimization methods are for general processor architecture.Due to the difference in hardware architecture between FPGAs and general processors,these methods can not give full play to the customizable characteristics of FPGAs.Therefore,the algorithm performance improvement on FPGAs are not so good as we expect.At present,there is still lack of algorithm design and performance optimization methods suitable for FPGA.(3)When implemented on FPGAs,the implementations of specific algorithms need to be deeply optimized according to the characteristics of FPGAs hardware architecture.Most of the existing optimization ideas are from the perspective of algorithm model,lacking of consideration of problems to be solved and the hardware architecture of FPGAs.Fully considering the above problems,this paper proposes high-performance algorithm implementation design and optimization methods on FPGAs.The main work is as follows:(1)The design pattern of high-performance algorithm for FPGAs and the evaluation criteria of computation performance are proposed.The control logic and operation of algorithm are transformed into the connection of circuit logic unit on FPGAs.The data to be operated flows through FPGAs according to the connection mode of circuits to produce the final results.The combination of circuits affects the flow of data and ultimately the performance of computation.Therefore,the performance optimization of the algorithm on FPGAs should aim at constructing efficient data stream.In this paper,we propose an algorithm implementation design pattern,which aims at constructing the streaming data of algorithm.By abstracting the efficient circuit model on FPGAs into the data flow model of algorithm,we shield the hardware structure details for designers and improve development efficiency.It is easier to improve the performance of the implemented algorithm as long as designers refer to the design pattern.The design pattern is only design reference.It has strong universality without requirement of supported by specific programming language or synthesis tools.It consists of three levels.Firstly,in the level of algorithm framework,for the streaming data of algorithm,it provides a multi-level pipeline "map-reduce" framework and a systolic linear framework.Secondly,in the functional level of algorithm,it provides a complex data type efficient pipeline summation tree and a parallel comparison vector.Lastly,in the level of algorithmic logic,it provides a variety of algorithm performance improvement methods for simplification control logic and reduction operation intensity.In addition,the performance evaluation of algorithm on FPGAs can not only consider program execution time,but it also has to consider many other aspects,such as,delay,frequency,throughput,utilization and power consumption.Therefore,this paper proposes performance evaluation criteria for the algorithm implementation on FPGAs,including acceleration evaluation criteria and performance evaluation equations for HLS.The design pattern and performance evaluation criteria proposed in this paper are of great significance to improve the performance of implementation on FPGAs.(2)This paper presents a linear sorting algorithm based on the extended non-strict partial sequence for FPGAs structures.We implement the sorting algorithm as a configurable linear sorter with the systolic linear frame design pattern.Sorting problem is a widely studied algorithm problem.However,most of the state-of-the-art sorting algorithms on FPGAs are implemented through transplanting the parallel designed classical sorting algorithm to FPGAs.Although the sorting delay decreases,the performance in resource utilization and other aspects still needs to be improved.In this paper,fully considering the hardware structure,we first extend the non-strict partial order relation to the non-strict partial order relation based on N tuples based on the order theory in math.Then,the linear sorting algorithms is proposed based on it.The proposed algorithm has time complexity of 4N/n.The bandwidth and number of comparison of the algorithm can be adjusted by n.The sorter on FPGAs has some optimized characteristics,such as,low resource utilization,low circuit connection complexity,configurable input bandwidth,sorting delay.Therefore,the designers can make trade-off between the delay and resource utilization according to the requirements of specific problems.The overall performance for the sorting problems is improved.Because the algorithm is designed for FPGA hardware architecture,we use the absolute acceleration ratio to evaluate the performance of the algorithm.Experimental results show that the algorithm has better efficiency than the fast sorting algorithm implemented on CPU.(3)A general framework for the implementation of swarm intelligence algorithm based on HLS is proposed.Swarm intelligence algorithms(SIAs)are computation intensive and proposed to solve optimization problems.The state-of-the-art work usually optimizes SIAs only by improving the spatial parallelism of SIAs,which leads to low throughput and limited solution scale on FPGAs.The framework proposed in this paper is based on the multi-level pipeline "map-reduce" framework design pattern,which matches the algorithm data flow with the hardware structure,and fully considers the memory architecture of the hardware platform,so as to obtain better parallel characteristics and greater throughput.The framework is based on HLS and described by C++ language.It can be deployed on different hardware platforms(FPGAs,GPUs and multi-core CPUs).In order to improve the efficiency of memory access,we optimize the framework according to the characteristics of each hardware platform.We take quantum behaved particle swarm optimization(QPSO)as the example to test the framework.We use the relative acceleration method to compare the performance of the framework on different platforms.Experimental results show that the framework achieves better performance than other state-of-the-art work,and the FPGA implementation has better computation efficiency than other platforms.
Keywords/Search Tags:Field Programmable Gate Arrays (FPGAs), Design Pattern, Computational Performance Evaluation Criteria, Linear Sorting Algorithm, Swarm Intelligence Algorithm
PDF Full Text Request
Related items