| Subgraph matching is an important research topic of graph theory.At present,it has been applied to many fields such as social network analysis and functional speculation of protein interaction networks.When the candidate matching set is initialized,many isolated nodes which are invalid are included in the set.Those invalid nodes need to be judged and filtered in subsequent steps,which lower the efficiency of subgraph matching methods.Therefore,as the size of network increases,the scalability of subgraph matching algorithms is weak.In addition,when the size of the query graph increases,the common subgraph matching algorithm can't obtain the matching result within reasonable time.In view of the above problems,this paper improves the subgraph matching algorithm,our specific work is as follows(1)Aiming at the problem that the scalability of subgraph matching algorithm is weak when the target graph is large,this paper proposes an influence-based subgraph match algorithm(InfSMatch).Considering the problem of isolated nodes in candidate matching sets,we calculate the influential value of each node on query graph by combining node's global and local structural feature information.We select the node which has the largest influential value as the central node,which can greatly reduce the number of isolated nodes in the initial candidate matching sets.In order to make the candidate region as small as possible,we use the breadth-first search to extend the sub-region.For the extension process,we also propose more detailed filtering strategies to further pruning the candidate matching set.In addition,in the verification phase,we determine the verification order of the nodes by considering the importance of the nodes and the connectivity between nodes.Experiments show that our proposed method is more efficient than other common algorithms(2)Aiming at the problem that the efficiency of subgraph matching algorithm is low when the query graph is large,this paper proposes a query decomposition based subgraph match algorithm named QDSMatch according to InfSMatch and parallelizes it on Spark.In order to reduce the join cost,we use the community detection algorithm to divide the query graph into multiple query subgraphs.Each query subgraph is operated separately,and the query result of each query subgraph is joined to obtain all isomorphic subgraphs.When matching each query subgraph,we determine the matching order of the query subgraph by calculating the average degree of each query subgraph and the connectivity among query subgraphs.In addition,we also propose a new filtering strategy to help reduce the number of candidate sets.At the same time,the QDSMatch algorithm is parallelized on Spark.The experimental results show that the optimal CPU core number is related to the size of query graph.What's more,our proposed method(QDSMatch)is more efficient than other parallelized subgraph matching algorithms. |