Font Size: a A A

Research On Cluster Enumeration Algorithm For Large-scale Bipartite Graphs

Posted on:2020-07-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y HeFull Text:PDF
GTID:2370330590478653Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Data mining is a practical subject.It applies specific solutions according to specific problems,finds the rules in the complicated data,and provides decision-making assistance to the researchers.Graph theory is a discipline that studies the connection of things in the objective world.The combination of the two is to hope to dig out the objective laws of the data in the objective world through specific programs.In today’s society,more and more data can be abstracted into a network structure,so there are more and more data mining problems in the field of graph theory.Among them,there are many data mining problems in the field of bipartite graphs.A lot of research results have been obtained.The main focus of this paper is on the bipartite and maximal bicilque enumeration.Biclique is a subgraph structure in the bipartite graph.Maximal biclique enumeration is of great significance in the real society.It can be applied to many fields,for example,purchase trend prediction,statistical analysis of social networks.To explore some interesting structures of protein interaction networks,e-commerce sites,and other applications.In order to make the research content more practical,the author first analyzes the characteristics of the actual bipartite graph data,and finds that it has the characteristics of large scale and sparse data.These features will play an important role in algorithm design.Among the previous research results,some work did not pay attention to these characteristics.In solving the problem of maximal biclique enumeration,the best method in the previous research results is a parallel solution mrMBEA based on MapReduce framework,which has good scalability and speedup,but at the same time,this solution also has certain defects,such as not using the structural features of the sparse bipartite graph,biasing the task amount estimation,and the like.So this paper designs a parallel solution with better effect.Firstly,this paper starts with the serial algorithm,then parallelizes it to achieve the purpose.After reviewing the data,it is found that there is a kind of iteration based on it.The recursive serial algorithm iMBEA has a significant effect in solving the problem of maximal biclique enumeration,but it also has many shortcomings,such as not analyzing the structural characteristics of the sparse bipartite graph,redundant algorithm process,and it is not easy to parallel.In this paper,a new serial algorithm sMBEA is designed.The performance advantages of the algorithm are also verified by experiments.Based on sMBEA,the parallel solution psMBEA is obtained by a dynamic load balancing strategy and a shared parameter storage structure.The experimental results show that its performance is more efficient than the traditional mrMBEA algorithm.
Keywords/Search Tags:Maximal Bicliques Enumeration, Graph Data Mining, Parallel Algorithms, Sparse Bipartite Graph
PDF Full Text Request
Related items