| With the advent of "era of big data", a variety of technologies for processingmassive data have come up, especially Hadoop which is based on traditionalMapReduce model is becoming the industry standard. However, in essence,MapReduce model’s process is a kind of offline batch process. So not all kinds ofstructured data can be efficiently carried out with the traditional MapReduceprocessing model, especially the data set with graph structure. In this condition, itrequires a new efficient computing model for this type of data set.This paper briefly discusses a distributed iterative computing model. Byresearching DPark(A distributed iterative computing framework developed byDouban.Inc, it’s a clone of Spark with Python) to understand how it works and itscore technology called RDD (Resilient Distributed Datasets). Secondly, many tasksin practical applications are related to large-scale graph algorithms, such as Web linksand diagrams of social relations, these diagrams usually have the same characteristic:large-scale(often billions of vertices and trillions of edges). It’s a big challenge tocompute such a large-scale data set efficiently. In this regard, Google present Pregelwhich is a model for efficient graph computing. But so far there is not a fullyimplemented graph computing engine for this. Therefore, besides doing research intothe distributed iterative model, we will study and realize a graph computing enginebased on DPark, called PyGel, which is developed using Python. This engine can beused to compute BFS, SSSP, PageRank and some other graph algorithms efficiently.Then, by comparing the runtime of PageRank in the traditional distributed computingmodel and PyGel to validate the computational efficiency of the final conclusions,and give relevant comparative data. Finally, by researching the current distributediterative model, Pregel and PyGel, figure out some possible shortages and possibleways to improve, then clear the direction of future work. |