Font Size: a A A

Research On Key Technologies Of Distributed Streaming Algorithms Based On Graph Data

Posted on:2020-09-06Degree:MasterType:Thesis
Country:ChinaCandidate:M D YuFull Text:PDF
GTID:2370330596476503Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Recently,as a data structure that can clearly reveal the relationship between data nodes,graph has become a very expressive form of data.The graph data has been widely used in social network analysis,user portraits and other fields.With the continuous expansion of data scale and the introduction of online computing,a graph often contains tens of millions of vertices and hundreds of millions of edges,which is difficult to store completely in memory,and it is difficult to return calculation results in real time.Hence,the distributed algorithms and the streaming algorithms have gradually become a research hotspot.Among the many graph calculation algorithms,the counting of local topology structures,such as triangles,is an extremely important basic task for analyzing large-scale complex networks.At present,the triangle counting algorithm has been widely studied.However,they are generally focused on the single-machine streaming algorithms approximately counting triangles,and the accurate counting algorithms under the distributed architecture.The research on the distributed streaming algorithm for counting triangles in graph stream on multiple machines is still in its infancy,and the stage has not been thoroughly studied.According to the above problem,we proposed three distributed streaming algorithms on multiple machines for estimating the number of triangles in a large-scale simple undirected graphs whose edges arrive as a stream,which based on the current best-performing sample-based streaming algorithm TRIèST.We mainly realize the partition of the graph stream with a distribution strategy,so that each worker independently estimates the number of triangles in a subgraph of the graph stream.Our algorithms reduce the estimation error and improve the computational efficiency.Specifically,the main contributions are as follows:1.Based on the analysis and research of the single-machine triangle estimation streaming algorithm TRIèST-BASE,we propose a improved distributed algorithm DVHT-b(Distributed Vertex Hashing TRIèST-BASE).For each edge stream in the graph data stream,the algorithm uses the hash function to map the two vertex numbers of the edge to the worker number,and map the edge by comparing the mapping values of two vertices.The edge is unicast or multicast to the corresponding worker to participate in the calculation.The probability that each triangle in the graph data stream can be counted by a worker is calculated to get the estimating..We have proved our algorithm implements an unbiased estimate of the number of triangles in the graph data stream under the distributed architecture;2.Based on the analysis and research of the single-machine triangle estimation streaming algorithm TRIèST-IMPR,we propose two improved distributed algorithms,DEHTi(Distributed Edge Hashing TRIèST-IMPR)and DVHT-i(Distributed Vertex Hashing TRIèST-IMPR)algorithm,.The former uses a hash function to calculate the mapping value of the edge stream,which determines whether to accept the edge through a fixed probability,thereby realizing the mapping between the edge and the worker.The probability that a triangle in the graph data stream can be counted is calculated at the aggregator node to implement an unbiased estimate of the triangle number under the distributed architecture.The latter uses the hash function to realize the mapping between the vertex of the edge and the worker,so that the edge is unicasted or broadcasted to the corresponding worker by comparing the mapped values.Using our concept of the packet hierarchical aggregator,our algorithm get the estimate quickly and effectively.We theoretically proved that both algorithms achieve unbiased estimation of the number of triangles in the graph data stream,and reduce the variance of the estimation algorithm;3.Implement the three algorithms using the MPI,and carry out extensive experiments on lots of real-world large-scale graphs.The results demonstrate that our improved algorithms reduce the estimation error and several times more accurate than state-of-theart streaming algorithms.
Keywords/Search Tags:streaming graph, graph algorithm, triangle counting algorithm, distributed streaming algorithm, approximate algorithm
PDF Full Text Request
Related items