| Graphs can capture the relationships between entities,thus it is widely used in many real-world applications,e.g.,social networks,circuit graph,and e-commerce transaction graph,which provides opportunities for utilizing graph processing to support data analysis.Due to the importance of graph processing,many graph algorithms(called concurrent graph processing jobs)are running concurrently on the same graph processing platform to analyze its daily graph data for different products and services.In addition to the same characteristics of traditional graph processing,concurrent graph processing jobs suffers from the problems of serious resource and data competition.Thus,how to efficiently support the execution of massive concurrent graph processing jobs and quickly analyze various meaningful information from the same graph has become an urgent problem.Currently,many graph processing accelerators and software systems are developed.However,because existing solutions do not take into account the potential locality and similarity among the data accesses of the concurrent graph processing jobs,they suffer from the problems of irregular graph traversal,significant redundant memory accesses,and poor communication performance when supporting concurrent graph processing jobs,which results in high data access overhead and low utilization of hardware resources.To solve these challenges,a number of optimization techniques are proposed to efficiently support the execution of concurrent graph processing jobs.First,a locality-centric concurrent graph processing accelerator is proposed to improve the memory access efficiency of traditional architectures.Because different graph processing jobs traverse the same graph along the same graph topology,there is a strong data locality between these jobs.Based on this observation,it explores the dependencies among the active vertices of all concurrent jobs,and then enables these concurrent jobs regularly traverse and process the same graph data along the graph topology,so as to regularize their graph traversals.Besides,it effectively consolidates the accesses to the vertex states for concurrent jobs,achieving better data locality.The results show that our proposed accelerator achieves up to 11.3 times speedup compared with the state-of-the-art hardware graph processing accelerators,i.e.,HATS,Minnow,and PHI.Second,an out-of-core concurrent graph processing system is proposed to reduce the redundant storage and data access cost,improving the utilization ratio of the hardware resources.Specifically,it first proposes a novel Share-Synchronize mechanism,which can serve multiple concurrent jobs by only loading and storing a copy of the same graph structure data in cache and memory,achieving higher utilization of the memory bandwidth and cache.Then,it further proposes the structure-aware graph repartitioning and buffering strategy,which adaptively loads the graph data required by concurrent jobs and buffers the frequently-used graph data in the main memory,so as to improve the utilization of loaded graph data and minimize I/O overhead.The results show that the proposed system improves the throughput by up to 13 times compared with the cutting-edge out-of-core graph processing system,i.e.,Grid Graph,Graph Chi,and X-Stream.Third,a similarity-aware distributed concurrent graph processing system is proposed to efficiently serve the concurrent jobs on the distributed platform.Based on the observed strong similarity among the concurrent jobs,it designs a similarity-aware execution model,which assigns the partitions to be handled by these jobs based on their topological order.Then,a novel communication scheme is proposed to regularize the communications for the concurrent jobs,thereby reducing their communication overhead.Besides,an incremental load-balancing strategy and a novel graph repartitioning scheme are proposed to improve the utilization of hardware resources on the distributed platform and ensure execution efficiency when handing dynamic graphs.The results show that the proposed system improves the throughput of the concurrent graph processing jobs by up to 6 times in comparison with existing solutions on distributed platform,i.e.,Gemini and Seraph.In general,this dissertation focuses on the problems of the irregular graph traversals of concurrent graph processing jobs,the redundant overhead of the data accesses between concurrent graph processing jobs,and the poor scalability of distributed concurrent graph processing.Several optimization techniques are proposed to significantly improve the execution performance of concurrent graph processing jobs. |