Optimization And Application Of Connected Component Algorithm For Large Scale Graph Computing

Posted on:2022-12-23

Degree:Master

Type:Thesis

Country:China

Candidate:H Bai

Full Text:PDF

GTID:2530307169483284

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

With the development of science and technology,mankind has entered the era of big data.Among big data,graph is an important part.Whether in a single-machine or distributed system,the performance requirements for graph computing are getting higher and higher.In graph algorithms,connected components(CC)is one of the most popular methods.As a hot research direction,it has received much attention.However,in the current implementation of the CC,serial and parallel ones both have some shortcomings.For example,there is not much research on the compatibility between algorithms and data structure,and the sub-algorithms’ contact are not close.How to overcome shortcomings and improve the performance of the CC,has become an urgent problem.In this paper,the various implementation schemes of the CC are deeply studied and the advantages and disadvantages of each method are summarized.We analyze their applicable scenarios,and the research focuses on the adaptability of algorithms&data structures and the coordination between algorithms.The main research work of this paper is as follows.Firstly,aiming at the blind spots of existing CC in single-machine systems,we propose the shortcutting-vector algorithm.We analyze the existing CC.Although there is much classic optimization for the Union-Find algorithm,but it only stays in the algorithm itself and does not consider the data structure togother.In response to the problem,we propose a shortcutting-vector algorithm.It fully combines the advantages of each optimization.The shortcutting is used to realize methods such as union-by-size and path compression,and vector is used to store each vertex in the CC.We have studied the good synergy produced by the combination of shortcutting and vector.Moreover,we decompose it into multiple subtasks for parallel.They are executed by multiple threads for data preprocessing,which speeds up the algorithm.Take our experimental results under the largest data set as an example.(1)Shortcutting-vector algorithm’s performance is1.38× and 1.40× that of other optimization.(2)The time and memory is 21.0%,21.3%and 4.2%,4.3% of the traversal algorithm.(3)Parallel algorithm has 2.57× speedup.Secondly,aiming at the problem of CC in distributed system,we propose TianheCC.We find that BFS can get only one CC at a time,the search of small CCs is slow,and the result contains redundancy.Although the S-V algorithm has better performance for small CCs,it needs to re-establish the storage structure,which causes overhead.Given this problem,our algorithm reuses the storage structure of the two sub-algorithms.For the BFS algorithm,by global vertices scattering and PV bitmap,the communication is reduced and the load balance is promoted.For our proposed 2-Level Broadcast algorithm,broadcasting eigenvalues is divided into two levels to reduce communication.The lockfree impact on the correctness is also resolved by adjusting sequence.(1)The total time,construction time,calculation time,and memory of Tianhe-CC are 64.6%,6.1%,76.9%and 15.0% of Gemini respectively.And it can also support 128 processes.(2)Tianhe-CC increases the speed of Graph500 by 5.60×.The procedure optimized by Tianhe-CC is ranked No.1 on the Graph500 SSSP Ranking and in the BIG Data category of the Green Graph 500.And it is ranked No.2 in the BIG Data category of the Green Graph 500.So Tianhe-CC contributes to Tianhe’s ranking.

Keywords/Search Tags:

graph computing, connected components, vector, shortcutting, Tianhe-CC

PDF Full Text Request

Related items

1	Research On The Discovery Methed Of Steiner Components Bsased On Data Graph
2	Every3-connected Essentially9-connected Line Graphs Is Hamilton-connected
3	Z₃-Connectivity Of 4-Edge Connected Triangle Graph
4	Research On The Parallel Query And Analysis Technology Of K-edge-connected Subgraph Over Large Graphs
5	Several Sufficient Conditions Of Hamilton-Connected Graphs
6	The Contraction Edges In The K-connected Graphs
7	Accelerating Graph Traversal By Graph Ordering
8	Research And Implementation Of High-throughput Computing For High-performance Computing Systems
9	Partitioning Of Connected Subgraphs In Vertex-weighted Graph
10	Research And Implementation Of Parallel Computer Architecture For Graph Search