Font Size: a A A

Research On The Parallel Query And Analysis Technology Of K-edge-connected Subgraph Over Large Graphs

Posted on:2014-07-20Degree:MasterType:Thesis
Country:ChinaCandidate:W A WangFull Text:PDF
GTID:2250330425491796Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of Internet, the research of data analysis and processing becomes very popular in wide applications such as web network, social network and others. Meanwhile, the scale of web data becomes larger with the advance of computer and network technology. Instead of centralized computing, parallel distributed computing in clusters is becoming the trend. The major advantage of Cloud Computing is the excellent parallel computing abilities, which benefit from its simple and effective parallel programming model. The most representative one is the MapReduce distributed parallel programming model proposed by Google. Moreover, BSP (Bulk Synchronous Parallel) is also a wide used parallel computing model.The purpose of this paper is to study the parallel query and analysis technology of k-edge-connected subgraph over large graphs. This paper chooses two typical problems: triangle counting and maximal k-edge-connected subgraph, and proposes solutions based on BSP model. In triangle counting, this paper proposes a parallel hybrid algorithm which combines the advantages of classical centrialized algorithms. Cut pruning of messages, parallel sample and deduplication of results are put forward to improve the efficiency of parallel algorithms. Moreover, some optimization strategies are presented, such as storage optimizations, algorithm improvement, avoidance of duplicate sampling. All the strategies are given reliable theoretical proofs.In maximal k-edge-connected subgraph, this paper analyzes the centrialized solution and proposes the improved parallel algorithm combining the features of BSP and basic algorithm of centrialized solution. Based on the features of improved parallel algorithm, this paper proposes a variety of pruning and optimization strategies to reduce the number of vertices waiting for process. In which, the threshold is dynamic change. What’s more, this paper also investigates sample as an optional phase for special applications. All the tactics are given reliable theoretical proofs in paper.This paper applies the proposed solutions in BSP and analyzes it. In triangle counting, compared with the Hadoop-based implementation,13times gains can be obtained in terms of executing time. Our message optimization reduces nearly half message and the sampling method performs well. Both the accuracy of the approximation and speedup are very high. In maximal k-edge-connected subgraph, the running time of Hadoop(MapReduce) based algorithm and centralized algorithm is very high with the increase of data size. But in BSP model, the running time of improved algorithm only reaches minutes. In the experiment of cut pruning, the running time of improved algorithm significantly reduces when the threshold increases. In addition, the sampling method for special applications is high in accuracy.
Keywords/Search Tags:BSP, graph processing, triangle counting, maximal k-edge-connected subgraph, cloud computing
PDF Full Text Request
Related items