Font Size: a A A

Research On Distributed Frequent Graph Pattern Mining Method Based On Simulation Matching

Posted on:2021-02-03Degree:MasterType:Thesis
Country:ChinaCandidate:G Q HuaFull Text:PDF
GTID:2370330602483766Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The goal of frequent pattern mining is to find all frequent patterns in the dataset.Then find the potential knowledge contained.According to the types of data objects mined,patterns can be divided into transactions,sequences,item sets and graphs.Mining frequent graph patterns in graph data is called frequent graph pattern mining.The goal of frequent graph pattern mining is to find out all the graph patterns whose occurrence times are greater than the given minimum support threshold.Frequent graph pattern mining has a very important theoretical and application value,many scholars are also committed to research new and more efficient frequent graph pattern mining algorithm.Graph pattern matching is an important operation in frequent graph pattern mining algorithm.In frequent graph pattern mining algorithm,through graph pattern matching,we can get the matching result of current candidate graph pattern in data graph,and then judge whether the graph pattern is frequent.According to whether the structure of matching results is strict,the notions of graph pattern matching can be divided into two categories:accurate matching and simulation matching.At present,many frequent graph pattern mining work is based on sub graph isomorphism to achieve the exact matching between candidate graph pattern and data graph.Subgraph isomorphism is too strict for matching result structure constraints,so mining in some applications will lose some meaningful frequent graph patterns.Existing simulation matching notions,such as graph simulation and dual simulation.As a new matching notion,simulation matching plays an important role in traffic network monitoring and social network analysis.In the field of frequent graph pattern mining,the existing simulation matching concepts can not be well applied to frequent graph pattern mining due to their loose constraints on the structure of matching results.For example,graph simulation may lead to the matching of connected candidate graph patterns to the unconnected sub-structures in the data graph,can not guarantee the topological structure of matching results,greatly affect the mining quality,and may lead to mining A large number of redundant graph patterns with repeated structures are found out.In order to mine more meaningful graph patterns in graph data,this paper applies simulation matching to match graph patterns and data graphs in the field of frequent graph pattern mining.In view of strong constraints on subgraph isomorphism and the limitations of existing simulation matching notion,this paper proposes a new simulation matching notion,called colSimulation,which defines that the nodes in the candidate pattern map are mapped to the data graph through a single mapping function,so that the matching result of the candidate pattern on the data graph is not less than the scale of the candidate pattern graph itself,and solves the problem of existing simulation matching notions in the frequent graph defects in pattern mining.Aiming at the problem of frequent graph pattern mining,this paper proposes a distributed frequent graph pattern mining method based on colSimulation,which is called SiMine.SiMine adopts the idea of overall synchronous parallel computing model and follows the master-slave device architecture mode.By making appropriate expansion and pruning strategies and designing optimization strategies to improve the computing efficiency of the whole distributed algorithm,SiMine can quickly mine all frequent graph patterns in the data graph.The experimental results on Youtube Video recommendation dataset and other real datasets show that compared with other frequent pattern mining algorithms,the distributed frequent pattern mining method based on colSimulation proposed in this paper is faster,and more meaningful frequent patterns are mined at the same time.With the arrival of the era of big data,emerging applications emerge in endlessly,and the application scenarios of graph data are more and more extensive.This method can be applied to traffic network monitoring,biological information analysis and other emerging fields in the future,in order to get more valuable frequent graph patterns.
Keywords/Search Tags:simulation matching, distributed, graph pattern mining
PDF Full Text Request
Related items