Optimization Technologies For High-Performance Concurrent Graph Processing

Posted on:2023-12-15

Degree:Doctor

Type:Dissertation

Country:China

Candidate:J Zhao

Full Text:PDF

GTID:1520307172952489

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Graphs can capture the relationships between entities,thus it is widely used in many real-world applications,e.g.,social networks,circuit graph,and e-commerce transaction graph,which provides opportunities for utilizing graph processing to support data analysis.Due to the importance of graph processing,many graph algorithms(called concurrent graph processing jobs)are running concurrently on the same graph processing platform to analyze its daily graph data for different products and services.In addition to the same characteristics of traditional graph processing,concurrent graph processing jobs suffers from the problems of serious resource and data competition.Thus,how to efficiently support the execution of massive concurrent graph processing jobs and quickly analyze various meaningful information from the same graph has become an urgent problem.Currently,many graph processing accelerators and software systems are developed.However,because existing solutions do not take into account the potential locality and similarity among the data accesses of the concurrent graph processing jobs,they suffer from the problems of irregular graph traversal,significant redundant memory accesses,and poor communication performance when supporting concurrent graph processing jobs,which results in high data access overhead and low utilization of hardware resources.To solve these challenges,a number of optimization techniques are proposed to efficiently support the execution of concurrent graph processing jobs.First,a locality-centric concurrent graph processing accelerator is proposed to improve the memory access efficiency of traditional architectures.Because different graph processing jobs traverse the same graph along the same graph topology,there is a strong data locality between these jobs.Based on this observation,it explores the dependencies among the active vertices of all concurrent jobs,and then enables these concurrent jobs regularly traverse and process the same graph data along the graph topology,so as to regularize their graph traversals.Besides,it effectively consolidates the accesses to the vertex states for concurrent jobs,achieving better data locality.The results show that our proposed accelerator achieves up to 11.3 times speedup compared with the state-of-the-art hardware graph processing accelerators,i.e.,HATS,Minnow,and PHI.Second,an out-of-core concurrent graph processing system is proposed to reduce the redundant storage and data access cost,improving the utilization ratio of the hardware resources.Specifically,it first proposes a novel Share-Synchronize mechanism,which can serve multiple concurrent jobs by only loading and storing a copy of the same graph structure data in cache and memory,achieving higher utilization of the memory bandwidth and cache.Then,it further proposes the structure-aware graph repartitioning and buffering strategy,which adaptively loads the graph data required by concurrent jobs and buffers the frequently-used graph data in the main memory,so as to improve the utilization of loaded graph data and minimize I/O overhead.The results show that the proposed system improves the throughput by up to 13 times compared with the cutting-edge out-of-core graph processing system,i.e.,Grid Graph,Graph Chi,and X-Stream.Third,a similarity-aware distributed concurrent graph processing system is proposed to efficiently serve the concurrent jobs on the distributed platform.Based on the observed strong similarity among the concurrent jobs,it designs a similarity-aware execution model,which assigns the partitions to be handled by these jobs based on their topological order.Then,a novel communication scheme is proposed to regularize the communications for the concurrent jobs,thereby reducing their communication overhead.Besides,an incremental load-balancing strategy and a novel graph repartitioning scheme are proposed to improve the utilization of hardware resources on the distributed platform and ensure execution efficiency when handing dynamic graphs.The results show that the proposed system improves the throughput of the concurrent graph processing jobs by up to 6 times in comparison with existing solutions on distributed platform,i.e.,Gemini and Seraph.In general,this dissertation focuses on the problems of the irregular graph traversals of concurrent graph processing jobs,the redundant overhead of the data accesses between concurrent graph processing jobs,and the poor scalability of distributed concurrent graph processing.Several optimization techniques are proposed to significantly improve the execution performance of concurrent graph processing jobs.

Keywords/Search Tags:

Big Data, Concurrent Graph Processing Jobs, Hardware Acceleration, Out-of-Core Graph Processing, Distributed Graph Processing, High Performance, Throughput

PDF Full Text Request

Related items

1	Research On Several Distributed Graph Processing Algorithms And A Unified Graph Programming Framework
2	A High Performance Graph Processing System Based On Low-redundancy Iteration Model
3	Research On High Performance Large-Scale Graph Analytic Middleware System
4	Optimization On Partitioning Methods And Processing Workflow Of Distributed Graph-Processing Systems
5	Research On Distance Computing In Distributed Graph Data Management System
6	Distributed Multi-task Learning Algorithms Based On Graph Signal Processing
7	Design And Implementation On Distributed Out-of-core Graph Processing System Based On One-sided RDMA
8	An Efficient Communication Method For Large-scale Graph Processing In Data Centers
9	Graph Analytics on Modern Graph Processing System
10	Anomaly Detection Algorithms Based On Graph Signal Processing