Font Size: a A A

An Efficient Communication Method For Large-scale Graph Processing In Data Centers

Posted on:2022-10-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y W WuFull Text:PDF
GTID:2480306572991189Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid growth of real-world graphs that can easily exceed the on-chip(board)storage capacity of an accelerator,analyzing large-scale graphs on a single FPGA-based graph processing accelerator becomes difficult.The multi-FPGA acceleration is of great necessity and importance.Many cloud providers(e.g.,Amazon,Microsoft,and Baidu)now expose FPGAs to users in their data centers,providing opportunities to accelerate large-scale graph processing.However,there are two main challenges in extending the existing single FPGA graph accelerators to the multi FPGA graph processing system in the data center: firstly,because the existing single FPGA graph accelerators are equipped with customized programming model,runtime system and communication runtime,it is difficult to reuse the infrastructure to produce new distributed accelerators;Secondly,when the distributed graph accelerator running in the data center does not consider the particularity of the torus interconnection scheme,there will be a lot of unnecessary communication overhead.A communication library for efficient large-scale graph processing on FPGA-accelerated data centers,called FDGLib,can easily scale out any existing single FPGA-based graph accelerator to a distributed version in a data center,with minimal hardware engineering efforts.FDGLib provides 6 APIs that can be easily used and integrated into any FPGA-based graph accelerator with only a few lines of code modifications.Considering the torus-based FPGA interconnection in data centers,FDGLib also improves communication efficiency using simple yet effective torus-friendly graph partition and placement schemes.We interface FDGLib into Hit Graph,a state-of-the-art graph accelerator.Our results on a 32-node Microsoft Catapult-like data center show that the distributed Hit Graph can be 2.32× and 4.77× faster than the state-of-the-art distributed FPGA-and CPU-based solutions(i.e.,Fore Graph and Gemini),with better scalability.
Keywords/Search Tags:Data Center, Accelerator, Graph Processing, Distributed Architecture, Communication Optimization
PDF Full Text Request
Related items