Font Size: a A A

Research On The Similarity Search Problem In Graph Databases

Posted on:2022-04-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y H LiFull Text:PDF
GTID:2510306323451244Subject:Software engineering
Abstract/Summary:PDF Full Text Request
At present,the field of Internet is developing rapidly,especially the complex and closely related data is growing by a large margin.These complex and interrelated data are usually represented by graph model.With the wide application of graph,efficient graph data management technology is becoming more and more important.Graph search operation is frequently used in large-scale graph database retrieval,so it is widely concerned.Graph similarity search is a common graph database operation,the most commonly used similarity measure is graph edit distance(GED).It is suitable for any type of graph and can capture the structural differences between two graphs accurately and effectively.In this paper,the graph similarity search problem using graph edit distance as similarity measure is deeply studied.Specifically,given the graph database and a query graph,the user gives the threshold of the editing distance of the graph,and the graph similarity search is in the graph However,the search based on graph editing distance is challenging in large graph database,which has proved NP hard.Most of the existing graph similarity search methods use the filter verify framework.Roughly calculate the lower bound of the edit distance between the query graph and each graph in the given database.If the lower bound of the edit distance is greater than the threshold given by the user,the data graph will be filtered out.On the contrary,the edit distance of the obtained graph is less than or equal to the similarity threshold set by the user,and the qualified graph is taken as the candidate set.Finally,the candidate set is verified.Aiming at the problems of graph partition overlap,index and approximate calculation in graph similarity search,this paper proposes a new filtering method and verification method to further improve the query performance of similarity search algorithm in graph database.The specific work of this graph similarity search method can be summarized roughly as follows:In the first stage,a graph similarity search method based on incremental partition is proposed to overcome the shortcomings of most algorithms in graph partition.It uses the idea of q-gram to enhance partition effect,quickly filters some graphs,and extends the traditional A* algorithm to verify the graphs in candidate sets.Two real data sets,AIDS and protein data sets,were used in the experiment.The results show that the graph similarity search method based on incremental partition is effective.Compared with the existing similarity search methods,similarity search method based on incremental partition has a great improvement in the comparison of response time and candidate set size between verification and filtering.In the second part,this paper proposes a graph similarity search method based on clustering tree index.Firstly,this paper optimizes the traditional k-means algorithm which must initialize k,and constructs a corresponding tree index for each cluster.For a given query graph,if it shares few graph partitions with a cluster center,the corresponding tree index of the cluster will be in the cluster Most of the subtrees will be pruned to quickly filter out a large number of graphs,thus speeding up the search time.The experimental results on two real large-scale data sets AIDS and NASA show that the performance of the algorithm is better than the existing algorithms in index construction time,response time and filtering time.In the third part,aiming at the shortcomings of most graph similarity calculation,agraph similarity search method based on graph neural network is proposed.By using theidea of graph isomorphism network and neural tensor network,the approximate value ofgraph editing distance can be found quickly.The experimental results show that this model is better than othermodels in time.
Keywords/Search Tags:Graph similarity search, tree index, clustering, graph neural network
PDF Full Text Request
Related items