Font Size: a A A

Parallel Design And Research On Community Detection In Complex Networks

Posted on:2016-10-10Degree:MasterType:Thesis
Country:ChinaCandidate:J W YuFull Text:PDF
GTID:2180330470468728Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the internet and the popularization of information technique, the massive data era with rapid accumulation has come. Nowadays, how to quickly and effectively process the massive data is one of the hottest topics in the field of information technology. Naturally, to solve the inherent challenges of managing big data is not only a matter of time, but also ensures appropriate computing infrastructure is available, so that data processing and analysis can be handled efficiently.With the emergence of cloud computing, the difficult problem of expensive hardware cost can be easily solved and it also provides an effective solution to practicability of massive data processing. Parallel computing model of cloud computing makes it follow the viewpoint of not trusting the node server. Multiple copies of the same data can be stored in different nodes. Although computing nodes are not stable, they can construct a stable cloud computing system. The currently popular Hadoop is the open source project of Apache foundation, providing a basic framework of a distributed system for developers. This paper aims to study the algorithm for discretization of continuous attributes and the complex network community detection algorithm, and put forward the related paralleling algorithms of two schemes which are based on MapReduce framework. The main work of this paper is as follows.1) Aiming at the shortage of the traditional algorithm for discretization of continuous attributes, this paper proposes a paralleling Chi2 algorithm based on MapReduce framework to improve the effect of pretreatment on massive data. Through an in-depth study of traditional Chi2 algorithm in parallelisation, this paper designs and implements the appropriate function under the MapReduce framework. Based on the importance degree of attributes, the discretization order is reasonably adjusted. Experimental results show that Chi2 algorithm combined with MapReduce programming model has good scalability and high efficiency, and provides an effective method for the rapid processing of massive data.2) Complex networks are usually used to explore the community structure in a large network. However, due to calculating the shortest path between each pair of nodes, it leads to the corresponding limitations. In order to solve this problem, this paper proposes a parallel version of the Girvan-Newman algorithm with programming under the framework of MapReduce model(MR-GN), which is a new method to support large-scale network. MR-GN algorithm is implemented using open source MapReduce framework on Hadoop platform. Experiment shows that with the linear increase of the number of reducers, the time decreases linearly. In the process of reduction, when the number of reducers meets the data size, the time curve will keep steadily.
Keywords/Search Tags:Cloud computing, Discretization, Chi2 algorithm, Community detection, Girvan-Newman algorithm, MapReduce
PDF Full Text Request
Related items