| Graph processing is one of the main processing modes in the field of big data,and is widely used in business scenarios such as social network analysis,web search,and product recommendation.Graph processing requires high performance,and hardware acceleration is the promising way to drive graph processing to high performance.Conventional hardware-based graph processing requires engineers to deeply understand the underlying hardware details,which has the problems of development difficulties,steep learning curve,and long development cycle.Although traditional general-purpose High Level Synthesis(HLS)tools can automatically convert high-level language code into hardware description language code,in graph processing scenarios,which has characteristics of irregular computation and random memory access,traditional HLS lacks efficient dynamic scheduling,cannot avoid data conflicts,and cannot flexibly adapt to multiple graph processing execution models,which greatly limits the graph processing performance acceleration.To address the above problems,an efficient layout mechanism based on data flow circuit and a high-level synthesis dynamic scheduling method for graph processing are proposed to achieve dynamic scheduling and load balancing for graph processing,which effectively reduces the pipeline stagnation problem caused by data conflicts.Dynamic adaptive parameter provisioning and accelerator template optimization methods are proposed for different graph processing execution models,enabling the high-level system to flexibly adapt to different graph processing execution models and gain performance benefits.Specifically,by introducing a graph-aware elastic circuit mechanism,dynamic information awareness and efficient task scheduling of graph processing load are achieved.For both vertex-centric push and pull processing modes,a fine-grained data conflict avoidance approach is implemented through runtime dependency prediction method.Considering the performance advantage of the edge-centric graph processing model in the dense phase,an efficient caching mechanism with continuous access is designed to enhance the localization of data access.The effectiveness of the proposed approach is validated by test results on a variety of graph processing algorithms and datasets through real-world deployments on Xilinx Alveo U250 FPGA accelerator cards,which can improve performance by 2.16 times compared to a typical general-purpose HLS system,Spatial. |