Font Size: a A A

Research On Efficient Subgraph Matching Algorithm

Posted on:2017-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:X DaiFull Text:PDF
GTID:2180330482479486Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As a data structure, graph can depict the complex relationship between different objects. So the data mining and management based on graph data enjoys a lot popular in both of research communities and industrials. For these researches and applications, the foundational task is finding a given query graph in a graph dataset, i.e., the exact subgraph matching. This task has great mining to areas such as graph database, bioinformatics, social network analysis. However, the math foundation of this task is the subgraph isomorphism, a NP complete problem. So it is obvious that finding an efficient matching algorithm is very challenge. The current studies have two main problem. The first one is how to reduce the candidate set quickly against a graph with more edges. The second one is how to find a good vertex-search order to speed up the subgraph isomorphism verification. Especially, the later one is the core issue for the exact matching problem.In this paper we propose a Multistage-Graph-Search-Model (MGSM) for this order selection problem, by which we deduce two key issue:feature selection and search cost estimation. The first issue has closed relationship with filtering step, so we optimized filtering technique of related works. For the second issue, we proposed a cost estimating approach named pseudo-tree. Based on it, we come up with a matching algorithm Tps(Three-Phase-Searching). In experiment, we not only find Tps is significantly efficient, but also prove the vertex-search order can also be effective for filtering step as for verification step. What’s more, based on the analysis on both exact and inexact algorithms, we come up with a efficient filtering algorithm HLMA. The experiments prove that HLMA can satisfies the demand of reducing candidate set with fast filtering speed.The works in this paper are quite innovative and creative. The Tps outperforms TurboISO, the best current algorithm, greatly. More than that, the efficiency of Tps benefits by better theoretical foundation, the MSGM. By that we can tell what is the best vertex-search order and review the present algorithms from a consistent viewpoint. Based on it, the pseudo-tree estimation is quite different and more reasonable than other approaches. In one word, the MSGM and the two key issues deduced from it provide blueprint for further researches, and the Tps algorithm is just a preliminary attempt of the blueprint.
Keywords/Search Tags:subgraph matching, subgraph isomorphism search, graph mining
PDF Full Text Request
Related items