Font Size: a A A

Research On Communication Optimization Of Distributed Graph Parallel Computing

Posted on:2018-07-03Degree:MasterType:Thesis
Country:ChinaCandidate:D Y ChangFull Text:PDF
GTID:2310330512987344Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years,the rapid development of Internet of Things,Internet and social networks,the amount of data stored in the network is growing at an explosive rate,and the connection with the real world is becoming closer.Figure structure,as a very important form of data representation,the size of graphics data with the growth of network data has become increasingly large.After investigation,it is found that the large-scale graph data has an important characteristic: the structure has the power-law distribution characteristic,that is,the number of nodes in the graph is very large,and the degree of most nodes is low.Researchers have proposed many graph computing systems to solve complex large-scale graph calculations.Distributed Graphlab and Power Graph are two representative graph computing systems that demonstrate superior performance,scalability,and fault tolerance.However,there are too many communication overhead between the computing nodes in these,which reduces the processing efficiency,which brings new problems and challenges to the analysis and processing of large scale data.In this paper,some representative graph computing systems are introduced,and the mainstream graph calculation system is analyzed and compared,and their advantages and disadvantages are summarized.After examining the graph data processing system and page ranking algorithm such as GraphLab and PowerGraph,this paper finds that the communication overhead is optimized.The PageRank algorithm is a method used by Google to identify the level / importance of a web page.At present,many link analysis algorithms are derived from the PageRank algorithm.PageRank is important for the calculation of page rankings.Secondly,A new communication mechanism.In this paper,the communication mechanism is integrated into the PowerGraph system,and the PowerGraph system based on the LowGraph communication mechanism is implemented.Then,for the PowerGraph in the abstract The calculation of the synchronization phase blindly synchronizes all mirrored copies,resulting in some unnecessary communication overhead issues,this article has improved PowerGraph,and the improved system named LowPowerGraph.LowPowerGraph identifies and eliminates unnecessary communication in the abstract computing information synchronization and information collection phase to reduce the communication overhead of the page ranking algorithm and improve the graph data processing performance.LowPower Graph eliminates the simultaneous communication of mirrored copies without output edges and the collection of mirrors without input edges.Finally,this paper proposes a edge-aware graph segmentation strategy(EdgeFeel),which optimally isolates the output edge and the input edge for each vertex,increasing the ratio of the mirror copy of the input edge and the mirror copy only to the output edge,Maximize the effects of Low Graph and LowPowerGraph.In this paper,LowGraph,LowPowerGraph and EdgeFeel were validated and validated.The results show that the LowGower mechanism can not only reduce the communication cost,but also reduce the running time of the page ranking algorithm.LowPowerGraph can significantly reduce the communication cost of PowerGraph and improve the performance of PowerGraph graph data processing.The EdgeFeel strategy proposed in this paper can greatly improve LowGraph and LowPowerGraph effects.
Keywords/Search Tags:Distributed, Graph computing, Communication optimization, PageRank, Graph partition
PDF Full Text Request
Related items