Font Size: a A A

Research On Overlapping Community Detection Based On Local Expansion

Posted on:2020-08-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y GaoFull Text:PDF
GTID:1360330590472936Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Community structure is one of the most important characteristics of complex networks,and it plays an important role in the field of sociology,biology and especially in domains of computer science,where systems are often represented as networks.A community is a set of cohesive nodes in a network that have more links between them and relatively fewer links between them and the rest of the network.Vertices in the same community tend to share common properties or play similar roles in a network.Communities overlap with each other when a vertex belongs to multiple communities simultaneously.A large number of algorithms have been presented on overlapping community detection in the literature,many of which are inaccurate or inefficient 1)at dealing with large real networks,2)at dealing with large-scale dynamic networks.Due to the lack of scalable overlapping community detection algorithms and the dynamics of real-world networks,the task of overlapping community detection in large-scale real networks and large-scale dynamic networks is still an open problem.One of the most adaptable methods for overlapping detection in large networks is that based on local expansion.However,due to the lack of efficient and effective seeding and community refining methods,the algorithms in this kind of method always result in low accuracy or process networks with much priori information of ground truth communities.Thus,the main content of the dissertation is based on local expansion and covers three aspects including seeding methods,community refining methods and overlapping community detection in dynamic networks.First,the selection of seeds always control accuracy of community detection algorithms based on local expansion,howerver,there is a lack of effective seeding methods.Some seeding methods neglect the problem that seeds could overlap with each other,communities expanded from them are nearly the same or densly overlap with each other;some seeding methods select the set containing a core vertex and all its neighbor vertices as a seed by only restricting statistical characteristics of the core vertex,and pay no attention to characteristics of the whole set that is the origin of the final community.To overcome the problems,a parameter-free seeding method based on the total degree and density of subgraphs utilizing merely local structures of a network is proposed,in which overlapping among seeds is not allowed.To the best of our knowledge,this is the first method that total degree and density of local clusters are applied to the selection of seeds.Based on the seeding method,a fast and accurate algorithm DSCM(Density-based Seeding and Conductance Minimizing)for overlapping detection in large-scale networks is proposed.Experimental results in synthetic networks demonstrate that the seeding method outperforms others in the state of the art.Experimental results in large real networks demonstrate that DSCM outperforms the overlapping community detection algorithms in the state of the art,in particular,it is about two orders of magnitude faster than the existing global algorithms,which have a lower quality than DSCM;and it obtains much more accurate community structure than that of the current local algorithms without any priori information of gound-truth community sturcture.Second,to overcome the problem that vertices in the same seed may not be in the same community,a new approach to measure similarity between two vertices is proposed,where vertex influence is taken into account.We define the seed of a community as a line or a triangle of three vertices,which contains a core vertex with relatively large influence and two other members selected by vertex influence and vertex similarity with respect to the core vertex in neighbors of the core vertex,the method not only insure that a seed possesses large total influence but also decrease the probability that vertices in the same seed do not belong to the same community.Based on the seeding method,an overlapping community detection method VISM(Vertex Influence and Similarity Method)based on vertex influence and similarity is proposed.Experimental results in synthetic networks demonstrate that the seeding method achieves relatively good community structure compared to others.Experimental results in large real networks show that VISM outperforms other state of the art overlapping community detection algorithms.In particular,it is faster by orders of magnitude than the existing global overlapping methods aimed at large networks with competitive quality,and it obtains much more accurate community structure than the current local algorithms without any priori information.Third,in order to improve the quality of communities discovered by algorithms based on local expansion,we present three community refinement phases to optimize conductance of communities:(1)inaccurate community affiliations between vertices and communities are modified by node movements;(2)densely overlapping communities are combined with a novel combining function,which takes into account the similarity of the two communities and the fraction of conductance change due to the merging;(3)we take conductance of a community as the benefit function to find communities for outliers of a community partition.Based on the refinement phases,an accurate local overlapping community detection algorithm LECM(Local Expansion and Conductance Minimizing)is presented.Experimental results in synthetic networks show that the refinement phases optimize conductance of communities successfully and largely enhance the community accuracy.Experimental results in large real-world networks show that LECM is superior to others in the state of the art.Finally,we present an adaptive community update method ADIS(Adaptive Community Update based on DIS)based on local expansion for community discovery in dynamic networks.The changes in a dynamic network are divided into four simple events: vertex insertion,edge insertion,vertex removal and edge removal.ADIS first locates a basic community partition on the first network snapshot utilizing DIS,and then updates community structure based on the optimization of conductance and local expansion for each kind of change in each snapshot respectively.ADIS can detect key evolution events in a dynamic network including community birth,community death,community expansion and community contraction.Experimental results in synthetic dynamic networks demonstrate that ADIS performs relatively well in accurately uncovering community structures.
Keywords/Search Tags:overlapping community detection, local expansion, seeding methods, community refinement, dynamic network
PDF Full Text Request
Related items