Hypergraph Parallel Computation Framework For Large-scale Complex Data Processing

Posted on:2015-04-22

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Qin

Full Text:PDF

GTID:2180330452964156

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Nowadays, big data which is used to describe and defne the massive data generat-ed in the era of information explosion, has been mentioned more and more often. Withthe coming of the era of cloud computing, big data is able to demonstrate its power.In the past, people used to adopt the approach of sampling which only compute on asmall subset of data. Nowadays, thanks to the technique of big data, users can performcomputation on the the full data, which requires the support of an efective distributedcomputing environment essentially. However, previous research focus on the low-levelfeld which is hard to provide an consistent primitive, and forced user to write diferentlow-level programs for diferent applications.MapReduceabstractionsolvedthisissueefectivelybyprovidingsimpleprogram-ming model which allows users to focus on high-level application instead of low-levelprogram. However, as a result of this kind of simplicity, it’s hard to compute on com-plex data, users have to write a bunch of MapReduce jobs which result in the inef-ciency of execution. A new model is required which should support the representa-tion of complex data and provide simple programming model. Graph-based parallelcomputation frameworks solved this issue, it represents data by graph and provides ahigh-levelprogrammingmodelbyoperationsdefnedonthevertices. Atthesametime,the low-level optimizations of these frameworks include the graph representation andpartitioning etc. Representative of these frameworks includes Pregel and GraphLab.Pregel is based on BSP model which make use of the super-step abstraction, computa-tions in a super-step will not be executed until the last one has been performed. Thisstrategy may cause the issue that some computing nodes may become the bottleneck ofthe entire system. GraphLab provides the execution of synchronous and asynchronous, the algorithms which do not need the synchronization of super-step will be executedefectively. However, the graph-based framework are all based on simple graph whichmeans one edge in the graph can only connect two vertices. The drawback of thesemodels is that it’s hard to represent the group information which often exists in thesocial network, and will cause information loss.Thus, in order to solve the issues mentioned above, we propose a hypergraph-based parallel computation framework which named HGPC, the data are representedbyhypergraphwhichmeansanedgecalledhyperedgeinthisgraphcanconnecttwoandmore vertices. Hyperedge stands for the concept of group naturally. In this paper, wedescribethedatarepresentationinHGPC,andproposeasimpleprogrammodelnamedMAN.Additionally, wediscusstwohypergraphpartitioningapproacheswhichadoptedin HGPC, they are randomized greedy strategy and parallel multi-level partitioningalgorithm.InordertoverifytheefectivenessofHGPCframework, wefetcheddatafromsinaweibo. For the purpose of fetching data from social network like sina weibo efectivelyand fexibly, we also design and implement a scalable distributed crawling frameworknamed COLA. The experiments demonstrate the efectiveness and scalability of CO-LA.WedescribeandimplementthelinkpredictionalgorithminhypergraphsbasedonHGPC. The experiment which make use of the fetched data demonstrates that, HGPCis able to describe the algorithm based on hypergraph model sufciently, and executedthese algorithms efectively.

Keywords/Search Tags:

big data, hypergraph, parallel computing, graph-based computation framework

PDF Full Text Request

Related items

1	Research On The Maximum Flow Acceleration Algorithm Of Large-scale Graph Based On Parallel Graph Computing Framework
2	A Tensor Based Big Data Efficicent Computation And Multimodal Analysis Apppoach
3	Research On Several Distributed Graph Processing Algorithms And A Unified Graph Programming Framework
4	A Parallelization Software Framework For Geo-data Conversion Algorithm
5	Parallel Optimization Of W-term Correction Algorithm Based On DALiuGE Computing Framework
6	Design And Implementation Of Graph Computing System Based On Ignite
7	Massive Data Many Task Parallel Data Framework For GWAS
8	Research On The Key Technology Of HopeFOAM:A Discontinuous Finite Element Based High Order Parallel Computing Framework
9	Research And Implementation Of Parallel Computer Architecture For Graph Search
10	Arithmetical Computation Via Self-Assembly Of DNA