| With the development of science and technology,mankind has entered the era of big data.Among big data,graph is an important part.Whether in a single-machine or distributed system,the performance requirements for graph computing are getting higher and higher.In graph algorithms,connected components(CC)is one of the most popular methods.As a hot research direction,it has received much attention.However,in the current implementation of the CC,serial and parallel ones both have some shortcomings.For example,there is not much research on the compatibility between algorithms and data structure,and the sub-algorithms’ contact are not close.How to overcome shortcomings and improve the performance of the CC,has become an urgent problem.In this paper,the various implementation schemes of the CC are deeply studied and the advantages and disadvantages of each method are summarized.We analyze their applicable scenarios,and the research focuses on the adaptability of algorithms&data structures and the coordination between algorithms.The main research work of this paper is as follows.Firstly,aiming at the blind spots of existing CC in single-machine systems,we propose the shortcutting-vector algorithm.We analyze the existing CC.Although there is much classic optimization for the Union-Find algorithm,but it only stays in the algorithm itself and does not consider the data structure togother.In response to the problem,we propose a shortcutting-vector algorithm.It fully combines the advantages of each optimization.The shortcutting is used to realize methods such as union-by-size and path compression,and vector is used to store each vertex in the CC.We have studied the good synergy produced by the combination of shortcutting and vector.Moreover,we decompose it into multiple subtasks for parallel.They are executed by multiple threads for data preprocessing,which speeds up the algorithm.Take our experimental results under the largest data set as an example.(1)Shortcutting-vector algorithm’s performance is1.38× and 1.40× that of other optimization.(2)The time and memory is 21.0%,21.3%and 4.2%,4.3% of the traversal algorithm.(3)Parallel algorithm has 2.57× speedup.Secondly,aiming at the problem of CC in distributed system,we propose TianheCC.We find that BFS can get only one CC at a time,the search of small CCs is slow,and the result contains redundancy.Although the S-V algorithm has better performance for small CCs,it needs to re-establish the storage structure,which causes overhead.Given this problem,our algorithm reuses the storage structure of the two sub-algorithms.For the BFS algorithm,by global vertices scattering and PV bitmap,the communication is reduced and the load balance is promoted.For our proposed 2-Level Broadcast algorithm,broadcasting eigenvalues is divided into two levels to reduce communication.The lockfree impact on the correctness is also resolved by adjusting sequence.(1)The total time,construction time,calculation time,and memory of Tianhe-CC are 64.6%,6.1%,76.9%and 15.0% of Gemini respectively.And it can also support 128 processes.(2)Tianhe-CC increases the speed of Graph500 by 5.60×.The procedure optimized by Tianhe-CC is ranked No.1 on the Graph500 SSSP Ranking and in the BIG Data category of the Green Graph 500.And it is ranked No.2 in the BIG Data category of the Green Graph 500.So Tianhe-CC contributes to Tianhe’s ranking. |