Font Size: a A A

Research And Implementation On Distributed Partition Algorithm Based On Heuristic Of Large Graph Data

Posted on:2016-11-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y J WuFull Text:PDF
GTID:2370330542957345Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,due to the development of the social network,graph data size becomes larger and larger.It exceeds the capability of the traditional centralized computing systems with processing such a large calculation,so the application deployment is now transferring from a small amount of HPC server or super computer to commercial cluster in the cloud environment.In order to support complex graph processing calculation(such as Pagerank,the shortest path,etc.),you need to make the data distributed,so computing system on the basis of the BSP model arises at the moment.Graph partition is one of the important problems need to solve on the BSP systems.Especially in a cloud environment,due to the large data size,the big graph data need to be divided into multiple partitions and parallelly processed in the cluster computing nodes.For the specific social network data,user data is often periodically increasing.It is not an efficient way to take an online graph partition every time,because each time in the process of computing the system need to load data and partition it renewedly,which will consume much unnecessary time.And online graph partition is difficult to achieve good classification effect,which means to guarantee the load balancing and interactive edges as few as possible between partitions.Therefore,based on a BSP model our team has developed a larger Graph processing system BC-BSP,and on the basis of this system we designed and implemented outline graph partitioning algorithm on LDG heuristic rules.The main contributions of this article are as follows:(1)we introduce the vertex revenue model and apply LDG heuristic rules on data partitioning,ensuring the load balance of each partition and reducing the interactive edges between the partitions,which will consequently preserve the topology of the graph.(2)we research and find the defect of LDG heuristic rules when vertices are input randomly,so we design two phase partitioning process based on the influence of vertex.That is to partition the influential vertices firstly,and then partition vertices with little influence.So that we can maintain the topology structure of the Graph better.(3)we found that it needs to maintain a global map information if we partition the vertices using the LDG heuristic rules,so it is difficult to expand on the big picture.In order to improve the magnitude of vertex processing that LDG rules can provide,we designs the two schemes.One is using LDG partitioning algorithm based on distributed memory which will map the distribution information of global vertices to different computing nodes to solve the bottleneck of memory footprint of centralized map.Another way is to compress the original graph and then we can partition the compressed graph.We design the Compressing Graph algorithm based on deleting edges with high betweeness,which can make the compressing graph to maintain the graph topology greatly.For the compression graph,we provide two partitioning strategies.One is a partition algorithm based on LDG,and another is to partition the compressing graph using the strategy of the distributed memory,which can greatly expand the magnitude of vertices partition.The data partitioning algorithm proposed in this paper and the idea of once partitioning repeatedly using is applied in BC-BSP system.And the experiment proves that they have good expansibility and stability.
Keywords/Search Tags:Graph partition, Heuristic, Big graph parallel system, Load Balance, Distributed Memory
PDF Full Text Request
Related items