Font Size: a A A

Research On Graph Partitioning In Distributed Graph Computing

Posted on:2020-06-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q LiFull Text:PDF
GTID:1360330599452733Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Since the 1990 s,the rapid development of information technology,represented by the Internet,has brought human society into the era of the Internet,enabling human beings to interconnect at anytime and anywhere.From city network to global aviation network,from brain network to metabolic network,all kinds of complex networks have filled every aspect of people's lives,including economy,politics,science and so on.The research of complex networks can be widely applied to biology,computer and other disciplines,so that people can better understand the complex systems in the real world.Nowadays,the scale of the network is huge.If neurons in the brain are regarded as vertices and dendrites interconnected between neurons as edges,the whole network contain 89 billion vertices and 100 trillion edges.How to mine and calculate these large-scale map data efficiently is the primary task of studying complex networks.Parallel computing technology is one of the most mature,widely used and feasible computing acceleration technologies.Therefore,it is of great significance to study the parallel acceleration technology of complex network computing.How to allocate tasks among computing units is the key to the performance of parallel computing.Graph partitioning technology is an effective means to solve this problem.The research of graph partitioning is driven by the demand of practical application.The rapid development of cloud computing technology,whether public or private clouds,inevitably leads to cluster heterogeneity.The rapid growth of data volume puts forward new requirements for the partitioning quality,partitioning efficiency and scalability of graph partitioning algorithm.The extensive application of query subgraph represented by graph computing with local characteristics such as finding the shortest path will become a new challenge to the previous graph partitioning algorithm,which makes it difficult for the former partitioning technology to adapt to today's complex and changeable cluster environment and application requirements.In view of the above three challenges,different solutions are proposed.The main contributions of this paper are as follows:1.Aiming at distributed cluster in heterogeneous computing environment,we propose a heterogeneous aware streaming graph partitioning algorithm(HaSGP).According to the different characteristics of hardware performance in parallel heterogeneous environments,this method allocates tasks reasonably and improves the efficiency of distributed graph computing.At the same time,a dynamic caching data management mechanism based on neighborhood structure is designed for streaming graph partitioning,which effectively manages the cached data in the partitioning process.Compared with the existing stream algorithm based on neighborhood structure,the efficiency of partitioning is improved significantly.2.To solve the problem of large-scale graph partitioning,we design a distributed large-scale graph partitioning algorithm based on hierarchical affinity clustering(DisHAP).Specific innovations are as follows: A hierarchical affinity clustering based on Boruvka algorithm is designed as the initial partition.Using vertex similarity as distance measure,two vertices close to each other are iteratively merged,and the vertices with the smallest neighborhood similarity and value in the subgraph are removed to constrain the subgraph with too large scale.In the absence of subsequent optimization,the partitioning quality is close to some existing large graph partitioning methods.To solve the problem of edge-cutting rate optimization between large-scale subgraphs,DisHAP uses dimensionality reduction operation to linearly embed the initial partitioning results into a one-dimensional vertex sequence.By rearranging the vertex sequence by piecewise optimizing edge cutting,the original problem is transformed into several sub-problems with less complexity to facilitate parallel computing.3.A graph partitioning algorithm(Muti-QS)for multi-subgraph query aware is designed.The algorithm divides graph data dynamically according to query history.Experiments show that Muti-QS can achieve up to 76% query execution locally.Moreover,query-aware partitioning is fast because it only operates on a small number of queries rather than a large number of vertices.At the same time,a synchronization model with local and global barriers is proposed.Because of the minimization overhead of local barriers,local barriers can minimize the latency of localized queries.The global barrier can manage dynamic subgraph partitioning well.In order to further reduce the traffic between partitions,Muti-QS uses a separator and combiner to synchronize through non-set communication to reduce the impact of concurrent network communication overhead.
Keywords/Search Tags:Graph Partitioning, Parallel Graph Computing, Complex Networks, Load Balancing
PDF Full Text Request
Related items