Font Size: a A A

Research And Parallelization Of Overlapping Community Discovery Algorithm Based On Local Extension

Posted on:2022-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q WuFull Text:PDF
GTID:2480306731978049Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the continuous growth of network size,how to discover the overlapping community structure of complex network in large scale network has become a research hotspot in recent years.Over the years researchers have proposed many classic overlapping community discovery algorithm,overlapping community discovery algorithm based on local development optimization is one of the typical method,it is mainly divided into two stages:stage of seed selection and local extension stage,but the algorithm still exist in the community division quality is not high,only applies to small community found problems.For this reason,this paper will study and improve the problem of low quality of overlapping community partition.Meanwhile,For improving the detection efficiency,the paper will study the parallelization processing and distributed parallel implementation of the algorithm.The main work of this paper is as follows:(1)In view of the current local expand overlapping community similarity method without fully considering the node importance and leading to the community divided the problem of inaccurate,this paper puts forward a kind of based on local extension of the rights to point and edge overlap(ROCDNWS)community discovery algorithm,this algorithm takes into account the global information of the network nodes and local information to measure the importance of nodes,Considering node based on edge weight similarity,this paper proposes a new seed selection method,using the node importance and similarity computing neighbor node and the fitness value of the core node,choose to make the initial quality of local community function,the optimal neighbor nodes as seed set,and expand through seed set,each choice and adaptability of the highest neighbor nodes to form the initial communities,Finally,the final overlapping community is obtained through the optimization and merger of the community.Experimental results show that,compared with other classical overlapping community discovery algorithms,the ROCDNWS algorithm proposed in this paper has a good performance in both the quality of overlapping community detection and the accuracy of algorithm.(2)To solve the problem of low time efficiency of the above algorithm,this paper proposes a parallel overlapping community discovery algorithm,PROCDNWS,based on the parallel computing framework of Spark GraphX.The algorithm is based on ROCDNWS.After the parallelization of point weights and edge weights,dightWightRDD(point weights)and edgeWightRDD(edge weights)are obtained.Then,according to these two values,the core node and the first K neighbors with the largest fitness are selected to construct the initial seed set SeedSRDD.The algorithm uses the broadcast mechanism to send the node and edge information to each cluster to complete the parallel community expansion of the seed set,and finally completes the community merging on each partition.The parallel algorithm is carried out by building a Spark distributed cluster environment to complete the relevant experiments.The results show that it is feasible to parallelize the ROCDNWS algorithm with Spark,which can further improve the efficiency of the algorithm.
Keywords/Search Tags:Overlapping community discovery, Community expansion, The Spark GraphX framework, Seed set selection
PDF Full Text Request
Related items