Font Size: a A A

FastMiner: A Large-scale Graph Data Mining System

Posted on:2022-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:S X LvFull Text:PDF
GTID:2480306575972329Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The graph is a relatively complex data structure.In the area of computer science,Graph is the collection of the vertices and edges.Compared to linear tables and trees,graphs can represent more information.In real life,Graph is usually used to describe the relationship between a data and another.Such as the link relationship between web pages,the composition of protein molecules and the social network relationships,all of them are presented in the form of graph data structures.As a part of the field of data mining,graph data mining has attracted widespread attention in recent years.As the scale of graph data continues to increase,users need a system to manage and mine graphs.Therefore,how to design a mining system for the large-scale graph data,has become a hot topic at the moment.In recent years,large-scale graph data mining systems have been released continuously,but these systems have the problem of not being strong in the division of graph data;in terms of storage methods,there is a problem of large consumption of space;in access In terms of methods,there are many problems with random access and huge intermediate results.On the whole,there is still a lot of room for optimization in large-scale graph data mining systems.FastMiner,a large-scale graph data mining system,is a system that can efficiently process large-scale graph data.The system proposes a partitioning algorithm of BFS first and then DFS.The BFS traversal tree is generated by breadth first search(BFS),and then depth first search(DFS)is performed on this BFS traversal tree,and the partition of each vertex is determined according to the order of DFS.This division method weakens the coupling between the partitions and strengthens the correlation within the partitions,laying a solid foundation for the storage and access of the graph data in the future.Due to the large scale of graph data,if it is stored in a traditional adjacency matrix or adjacency table structure,a large amount of storage space will be consumed.Based on this,FastMiner uses Compressed sparse row(CSR)as the storage structure,which can effectively save memory resources when the graph is large.Finally,in terms of graph data mining process,FastMiner proposed an Extend-Filter search model.The search model is divided into two parts: Extend and Filter.First,the vertices in the data graph are expanded into Wedges by DFS,and then these wedges are tested for isomorphism,and the conditions are retained.the result of.Finally,after multiple iterations through the BFS method,the subgraph required by the user can be found from the data graph.This search model discards the complete BFS data access method used in the previous graph data mining system,and instead uses the combined access sequence of DFS and BFS.This data access method can effectively reduce the number of intermediate results,while controlling the frequency of random access to a small level.FastMiner has tested the time performance of various algorithms on millions of graph data,and the system has reduced the time consumption of Triangle Counting by about 33%compared to Rstream.Perform even better on the algorithm of Motif Counting and Frequent Subgraph Mining.Experiments show that FastMiner has improved time performance compared to Rstream.
Keywords/Search Tags:Vertex Reorder, Graph Data Storage, Graph Mining System
PDF Full Text Request
Related items