PyGel:a Distributed Graph Computing Engine Based On DPark

Posted on:2014-05-17

Degree:Master

Type:Thesis

Country:China

Candidate:J Zhang

Full Text:PDF

GTID:2268330425475779

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the advent of "era of big data", a variety of technologies for processingmassive data have come up, especially Hadoop which is based on traditionalMapReduce model is becoming the industry standard. However, in essence,MapReduce modelâ€™s process is a kind of offline batch process. So not all kinds ofstructured data can be efficiently carried out with the traditional MapReduceprocessing model, especially the data set with graph structure. In this condition, itrequires a new efficient computing model for this type of data set.This paper briefly discusses a distributed iterative computing model. Byresearching DPark(A distributed iterative computing framework developed byDouban.Inc, itâ€™s a clone of Spark with Python) to understand how it works and itscore technology called RDD (Resilient Distributed Datasets). Secondly, many tasksin practical applications are related to large-scale graph algorithms, such as Web linksand diagrams of social relations, these diagrams usually have the same characteristic:large-scale(often billions of vertices and trillions of edges). Itâ€™s a big challenge tocompute such a large-scale data set efficiently. In this regard, Google present Pregelwhich is a model for efficient graph computing. But so far there is not a fullyimplemented graph computing engine for this. Therefore, besides doing research intothe distributed iterative model, we will study and realize a graph computing enginebased on DPark, called PyGel, which is developed using Python. This engine can beused to compute BFS, SSSP, PageRank and some other graph algorithms efficiently.Then, by comparing the runtime of PageRank in the traditional distributed computingmodel and PyGel to validate the computational efficiency of the final conclusions,and give relevant comparative data. Finally, by researching the current distributediterative model, Pregel and PyGel, figure out some possible shortages and possibleways to improve, then clear the direction of future work.

Keywords/Search Tags:

distributed, DPark, iterative computing, Pregel, graph computing

PDF Full Text Request

Related items

1	Design And Implementation Of Distributed Graph Computing Engine
2	Research Of Graph Computation Based On GPU
3	Hybrid Graph Query And Graph Computing Engine For Distributed Graph Database
4	Graph Reachability Distributed Computing And Application Based On Spark
5	Design And Implement Of A Light Distributed Computing Engine Based On Memory
6	Research And Implementation Of Distributed Algorithm For Large-Scale Subgraph Enumeration In Pregel
7	Research And Application Of Parallel Computation Framework Base On Task Type
8	Cluster Based Large-scale Distributed Graph Processing System
9	Research And Implementation Of Execution Optimization For Graph Computing With Application Resource Awareness In Cloud Environment
10	Research On Distributed Graph Computing Performance Optimization For Natural Graphs