Font Size: a A A

Research On JA-BE-JA Graph Partitioning Algorithm Based On BSP-spark

Posted on:2018-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:S S GuoFull Text:PDF
GTID:2370330596954794Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of information technology,graph algorithm has made rapid progress.And it is urgent to deal with the large scale graph data.The traditional centralized graph partitioning algorithm has complex graph data processing,thus the emergence of including the Spark,and other distributed parallel map data processing framework.They are used to solve the problems of distributed parallel graph partitioning.Although many solutions have been put forward and the theory is becoming more and more perfect.The equilibrium graph partition is still a NP problem.Some traditional algorithms cannot be directly applied to large-scale distributed scene.And the traditional graph partitioning mostly contains the global frequent operation,leads to a huge access overhead,and exists storage optimization problems on distributed machine etc.How to efficiently store and partition the graph data,and improve the operation performance of large graph partition becomes a new challenge.Based on the above problems,this thesis puts forward the improved JA-BE-JA fully distributed algorithm.The algorithm is combined with simulated annealing technique and data migration algorithm for edge-cut and vertex-cut.And it uses the heuristic method to solve the vertex centered equilibrium graph partitioning problem.The advantage of this algorithm is that it can process each vertex data of the graph data in parallel,and only need to process the direct neighbor vertex each partition.The main contributions of this thesis are:(1)This thesis studies the main techniques of the parallel graph processing system.The results show that the BSP model can ensure the certainty and parallelism of the computation,and is suitable for the distributed computing framework.So a distributed computing framework based on BSP model is proposed.It’s called BSP-Spark and its performance of the data partition module is optimized.(2)Based on the original JA-BE-JA distributed graph partitioning algorithm,an improved simulated annealing technique is proposed.It takes into account the time performance of the simulated annealing technique and the parameters of the random process,and avoids the original JA-BE-JA algorithm into a local optimal bottleneck.(3)Based on the original JA-BE-JA distributed graph partitioning algorithm,an improved data migration algorithm is used to fully consider the performance and migration cost of the graph segment to save the bandwidth and improve the resource utilization rate,and realize the dynamic data migration strategy.Through the theoretical analysis and related experiments,the improved JA-BE-JA algorithm is used to deal with large-scale distributed graph partitioning in the BSP-Spark distributed cluster.Especially in dealing with large-scale power-law distribution map,it is more effective than the traditional edge segmentation algorithm.
Keywords/Search Tags:Distributed graph partitioning, BSP model, BSP-Spark, edge-cut, vertex-cut
PDF Full Text Request
Related items