Font Size: a A A

Community Detection For Large-scale Networks

Posted on:2021-03-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:J Z WangFull Text:PDF
GTID:1360330647454892Subject:Machine learning and bioinformatics
Abstract/Summary:PDF Full Text Request
Network science is one of the most active research fields in recent years because it has important applications in many fields,such as biology,informatics,sociology,and computer science.A variety of complex systems in different areas can be represented uniformly in the form of networks.With this,the development of network science has brought significant advances to our understanding of the complex systems.One of the important features of networks representing real systems is community structure,or clustering,i.e.the organization of vertices in clusters,with many edges joining vertices of the same cluster and comparatively few edges joining vertices of different clusters.Similar to the tissues or organs in the human body,these communities or classes of the network can be regarded as relatively independent components or functional groups in the network.Discovering these community structures can greatly improve people's understanding of these networks.Therefore,detecting the community structure in the network has become a very basic and important problem in network data analysis.For solving the above problem,numerous statistical models have been developed in the literature.Among them,the random block model(SBM)is one of the most widely studied and applied network models for this purpose.For fitting the SBM,many algorithms have been proposed,such as EM algorithm,MCMC method,variational method,spectral method,pseudo-likelihood algorithm,profile-likelihood algorithm,etc.,but most of them cannot handle the large-scale networks,and they often have shortcomings in community detection accuracy,operational efficiency,computational convergence theory,or statistical consistency theory.Based on this,we propose two novel likelihood-based inference frameworks in this article,namely the profile pseudolikelihood framework and the split-likelihood framework,and based on these,we construct the corresponding profile pseudo-likelihood algorithm and the split-likelihood algorithm that can handle networks with millions of nodes.They can effectively overcome the shortcomings of existing methods,and have guarantees in computational and statistical theory.On the other hand,with the development of data collection technology,the types of network data that we can obtain are becoming increasingly rich.In particular,for the same batch of nodes,we can sometimes collect the information from the multiple networks and the attribute data of each node.In view of this,we propose the multi-layer pseudo-likelihood to deal with the multi-layer networks and a penalized alternating factorization(PAF)algorithm to solve the community detection problem of the multi-layer attributed networks.
Keywords/Search Tags:network analysis, profile likelihood, pseudo likelihood, split likelihood, stochastic block model, strong consistency
PDF Full Text Request
Related items