Font Size: a A A

Research Of Distributed Community Detection Algoithm Based On Mapreduce

Posted on:2017-01-05Degree:MasterType:Thesis
Country:ChinaCandidate:P ZhangFull Text:PDF
GTID:2180330503482289Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The research of community detection algorithm in complex networks has important theoretical significance and application value. However, with the advent of big data, the increasing complex network size makes the traditional memory-based community detection algorithm can’t adapt to the new environment. In order to deal with the challenge of the big data era, using distributed parallelization method based on big date technology can effectively solve the problems of community detection in large scale or super large scale complex networks.In this paper, we mainly do the following work for the non-overlapping community and overlapping community detection in large-scale complex networks.Firstly, a distributed processing model of complex network based on Hadoop platform is constructed. The original data division method is designed based on HDFS. The intermediate result is based on the HDFS to do batch reading and writing, and the HBase is used to do the dynamic data storage. Based on the Map Reduce, a multi stage Map Reduce processing model is designed.Secondly, in order to discover the non-overlapping communities, a distributed non-overlapping community detection algorithm MR-SCAN based on network structure clustering algorithm(SCAN) is proposed. The definitions of Direct Node Community(DNC) and Mergeable Node Community(MNC) are proposed in this paper. The Calculating Node Similarity Algorithm(CNSA) is to calculate the similarity between the connected nodes, the Marking and Merging Node Community Algorithm(MMNCA) based on principle of clustering network structure to calculate the DNC, and mark the MNC, then merging all MNC will get the final non-overlapping community structure.Then, a distributed overlapping community detection algorithm MR-ECOCD based on edge clustering is proposed to solve the overlapping community detection problem. The definitions of Direct Edge Community(DEC) and Mergeable Edge Community(MEC) are proposed based on the method of edge density clustering. The Calculating Edge Similarity Algorithm(CESA) is designed to calculate the similarity of adjacent edges. And the Marking and Clustering Community Algorithm(MCCA) is to calculate and mark DEC, then merge all MEC. Finally, the Translating Edge Community Algorithm(TECA) can translate the edge community to node community which is the final overlapping community structure.Finally, the proposed algorithms are programmed to achieve, and extensive experiments show that our algorithms have good performance in terms of accuracy and effectively.
Keywords/Search Tags:Complex networks, Community detection, Overlapping community, Distributed computing, Map Reduce
PDF Full Text Request
Related items