Font Size: a A A

Community Detection Algorithm Based On Seed Expansion And Its Parallelization

Posted on:2020-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:Q LiuFull Text:PDF
GTID:2370330623957389Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Community detection is a hot research topic in the field of data mining in complex networks.As the network becomes more complex,the community detection algorithm based on seed extension has become a hotspot of community detection algorithms in recent years.It has great advantages in seed selection and community expansion.How to choose the most appropriate and accurate seeds,how to make a strategy for community expansion,it is a top priority to maximize the accuracy and effectiveness of community detection.Combining with the above problems,a new seed expansion based community detection algorithm is proposed and paralleled in this paper from the two aspects of seed selection and community expansion.The specific work is as follows:(1)For the problems of most measurements of node importance such as one-sidedness and inaccuracy,a method called Local and Global Information based Node Influence Method(LGI)is proposed in this paper.Local and global information are utilized to measure influence of node in a network and top-k nodes are selected as seeds.The experiments show that LGI can identify best nodes with high influence and compared with other centrality methods,node influence ranking results of LGI is more accurate.Later,Seed Expansion and LDA based Community Detection Algorithm(SELCDA)is proposed.The seeds and neighbors consist of initial communities.The LDA topic model based on Gibbs sampling is introduced to identify the probability of unassigned nodes to each topic,which is considered as the probability of unassigned nodes to initial community.The unassigned node is divided into a initial community with higher probability,thereby obtaining the community structure.Experiments show that SELCDA can detect communities more effectively compared to other recent methods.(2)Aiming at the problems that the initial community overlap rate of SELCDA is too high and the community expansion strategy is unstable,a method called Similarity and Distance based Community Detection Algorithm(SDCDA)is proposed in this paper.We add a process of filtering in the stage of seed selection.The seed is filtered to prevent the seeds from being adjacent and excessive overlap of the initial community.In the community expansion stage,priority of the unassigned node is calculated based on the similarity between unassigned nodes and the communities and the reciprocal of the distance to the communities.Unassigned nodes are divided into appropriate communities with higher priority.Then thecommunity is merged to form the final community structure.Experiments show that compared with SELCDA,SDCDA further improves the accuracy of community detection.Finally,SDCDA is paralleled based on the parallel computing framework of Spark(PSDCDA).The experimental results show that as the number of CPU cores increases,execution time of PSDCDA decreases.What’s more,compared with other parallel community detection algorithms,PSDCDA shows high accuracy in different large scale datasets.
Keywords/Search Tags:Community Detection, Influence of Node, LDA Model, Community Expansion, Spark Parallelization
PDF Full Text Request
Related items