Font Size: a A A

Research On Edge Priority-based Partitioning Algorithmin In Static Power Law Graph

Posted on:2022-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:D YuFull Text:PDF
GTID:2480306521979919Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of hardware technology and the explosive growth of network data,the scale of complex networks is also expanding rapidly.As a result,in the past ten years,many efficient solutions for processing and analyzing large-scale graph data have been proposed and subsequently applied to various scenarios.In the processing of large-scale data,distributed processing will inevitably occur,also in the distributed parallel computing of graph data,and the division of data is a prerequisite for distributed computing,the quality of the results of the division directly affects the performance of distributed Graph processing and tasks throughput of cluster devices.Efficient partitioning algorithm will comprehensively consider the balance of partition capacity and network synchronization overhead,which has always been an NP problem.Researchers have tried to solve the problem of graph partitioning from various angles in the past.Nowadays,as the scale of the network is increasing day by day,some traditional graph partitioning algorithms are due to frequent global operations,such as spectrum analysis,recursive matrix operations,etc.,and their excessive space-time overhead makes it difficult to divide large graphs,they are no longer lightweight facing with these large-scale graphs,and the quality of division is not good.Some existing distributed partitioning algorithms for large graphs are mostly based on Hash,but they ignore the internal structure of the graph,resulting in excessive cut edge replication rate or vertex replication rate,which makes too high network synchronization overhead in the distributed processing of graph.Therefore,how to divide graph in an effective time bound and ensure a lower boundary data replication rate has very important research significance for distributed graph processing.Based on the above problems,this article is oriented to most of the power-law graph in offline scenes.In the mode of cutting vertices,a heuristic partition method(EPH)based on edge priority is proposed.EPH refers to the degree distribution in the law graph and follows the principle of cutting vertices with as large degree as possible.At the beginning,the edges of the whole graph are collected according to the comprehensive ranking of vertex degree and local centrality,and a processing sequence with priority from high to low is constructed,and then the edges are processed according to the EPH partitioner.As the partitioning process continues,the standard deviation of the capacity of partitions gradually stabilizes,and the back edges in the sequence is often adjacent edges of the vertex with a higher degree or a larger local centrality,and these edges are more easier to be assigned evenly.In the actual evaluation,we selected multiple sets of real-world network graph and synthetic graph,and selected several commonly used partitioners for comparison.We apply the results of EPH to the Graph X.The test results show that the overall RF value of EPH is greatly reduced,and the execution efficiency of conventional distributed graph algorithms is greater under the same data,same partition and different division methods,the balance and stability of the partition capacity of EPH are also better.On this basis,we extend EPH to a parallel multi-stage heuristic partition method(Par-EPH).Par-EPH splits the EPH into a serial stage and a parallel iteration stage.In the serial stage,the initialization of each partition is still carried out according to the EPH mode,and the parallel iteration can be started when a stable state is reached.In a superstep,each processing unit is executing locally as EPH mode,and the global partition information is synchronized at the end of the super step.The total number of iterations can be set manually.When a larger number of iterations is set,more frequent globals can be obtained,which can make the partition quality closer to the EPH method.When the number of iterations is too large,it will degenerate to the EPH method.Through experimental tests,the partition quality of Par-EPH is slightly lower than that of EPH,but it is also a better lightweight parallel partition method with high practicability.
Keywords/Search Tags:Graph Partition, Graph X, Parallel Computing, Degree Based
PDF Full Text Request
Related items