Font Size: a A A

Research On GraphX-based Overlapping Community Discovery Algorithm

Posted on:2022-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:K L YouFull Text:PDF
GTID:2480306494971439Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Network graph is a common concept in daily life.Take social network graph for example,the relationship between people often presents a community structure,and people in the community interact more frequently than those in outside.Mining such community structures can yield meaningful information,such as finding out who has the most influence on social platforms.The research of community discovery has important practical significance and has been a research topic for scholars for a long time.With the deepening of the research on community discovery algorithms,many classic community discovery algorithms have emerged.These algorithms are based on different ideas.This paper mainly studies the overlapping community discovery algorithm based on local expansion.This kind of algorithm is simple,efficient and has certain accuracy,but it has some problems such as inaccurate initial community selection and low efficiency of expansion method.In this paper,the classic local expansion algorithm Doc Net is improved.On the basis of not using the network prior information,combined with the node importance algorithm,the improved DOCLLE algorithm is completed.Finally,the algorithm runs in parallel on the graph computing framework Graph X provided by the distributed computing platform Spark.The main work of this paper is as follows:Firstly,described the research background,significance and current state of community discovery algorithm.Outlined several classical community discovery algorithms.We studied the relevant theoretical knowledge of community discovery and spark,which provides a theoretical basis for improving the algorithm and realizing the parallelization of the algorithm.Second,based on the DocNet algorithm,the initial node selection and node affiliation calculation methods are improved by incorporating the importance ranking algorithm and local similarity concept to make the community division more accurate.Thirdly,the improved algorithm is compared with classical overlapping community discovery algorithms by conducting several groups of control experiments on artificial generated networks and real networks.Moreover,different evaluation criteria are adopted to evaluate the quality and stability of community division of each algorithm.The experimental results show that with the increase of community size,the community partition ability of the proposed algorithm is gradually improved and has higher accuracy than the original algorithm.Fourthly,analyzed the advantages of Spark and graph-processing framework GraphX in graph computation.The proposed algorithm is rewritten in Scala language to run on the Spark distributed computing platform,so as to improve the performance of the algorithm on large data sets.
Keywords/Search Tags:community discovery, overlapping communities, local extensions, importance of node, GraphX
PDF Full Text Request
Related items