Font Size: a A A

Research On Adaptive Optimization Method For Distributed Graph Computing System

Posted on:2021-05-12Degree:MasterType:Thesis
Country:ChinaCandidate:J H ChenFull Text:PDF
GTID:2370330647456714Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,industry and academia have developed a number of distributed graph computing frameworks to efficiently analyze and process large amounts of graph data.These frameworks have to properly handle data coherency between replication of vertex.Many frameworks use approach of eager data coherency to solve this problem.This approach treats the replication of the vertex as one indivisible atomic vertex,and any changes on the replication are immediately synchronized to the other replicas,thus incurring frequent global synchronization and communication overheads.For this,we have proposed a method called Lazy Async in our previous work.It is based on the idea of lazy data coherency.The different replicas of the vertex are treated as mutually independent vertex,each iteratively updated,and finally the data is computationally coherence.This method effectively reduces the global synchronization and communication caused by data coherency in the system,and greatly improves the efficiency of the existing distributed graph computation.However,different turn-on strategies will give Lazy Async different levels of performance gains.This leaves Lazy Async to be manually tuned for maximum performance gains.There is still no proven adaptive optimization method to solve the problem of Lazy Async's turn-on strategy.This paper firstly found through experiment analysis that the influence degree of different turn-on strategies on Lazy Async is mainly influenced by redundant computing.Lazy Async avoids the redundant synchronization and communication in the eager data coherency method.But there are redundant computation in its local computation.A turn-on strategy with a low percentage of redundant computation yields better performance.Subsequently,the offline analysis in this paper reveals that there is a pattern of localization of solutions in the iterative process of distributed graph computation.The global solution of a vertex consists only of some local solution,and the localization of solutions helps to reduce redundant computations in local computations.Ultimately this paper designs and implements an adaptive optimization method based on the localities of solutions for Lazy Async.This method online statistically identifies the relationship between global and local solutions on vertices to identify the locality of the solution that emerge during iterations,thereby adaptively selecting the turn-on strategy and automatically obtaining a relatively optimal performance improvement.This method overcomes the disadvantages of previous decision tree methods that required onerous prior manual tuning.And the ability to effectively handle decision tree failures makes Lazy Async a truly practical approach.In the cluster environment of 48 machines,this method is compared with the eager data coherency method on 4 typical graph algorithms and different types of input graphs.Acceleration ratios of 1.2 to 4.7times were obtained that were consistent with manual tuning results.
Keywords/Search Tags:Adaptive Optimization, Lazy Data Coherency, Graph Computing, Distributed Computing
PDF Full Text Request
Related items