| With the continuous progress of social science and technology,more and more data are expressed and stored in the form of graphs.As a hot topic in data mining,subgraph query processing technology has been widely used in social network,Web graph and biochemistry.Due to the exponential growth of the amount of information in the Internet,the structure of the data graph is complex and the scale is huge.However,the existing algorithms have low efficiency and poor scalability in the query of the large-scale data graph.How to efficiently query in the massive data graph is a challenge faced by the current research.In view of the above problems,this subject carries out the following research:Firstly,a graph compression algorithm based on vertex features is proposed to solve the problem of large data graph scale.The algorithm firstly classifies node labels,and ranks the vertex degrees in the tag group of the same type.For nodes with the same degree,it compares the information of its neighbor nodes to determine whether they are equivalent nodes with the same structure,and merges the equivalent nodes to obtain a smaller compressed graph.Secondly,to solve the problem that the index structure occupies too much memory,a two-layer index storage structure based on the compressed graph topology is proposed.The node degree and topology information in the compressed graph are used as the first layer,and the label type and number of the adjacent nodes of the compressed nodes are used as the second layer for storage.Thirdly,a Top-K subgraph query algorithm based on double-level index is proposed.The degree of nodes in the query graph was sorted,the vertex with the highest degree of nodes was selected as the starting node for query,and the candidate set was obtained by initial pruning using the information stored in the first layer of the two-layer index.The query subgraph is obtained by filtering the topology information of the adjacency nodes in the second index.Then the interest degree is calculated and the Top-K query subgraph satisfying the conditions is output.Finally,the time efficiency of the proposed algorithm is verified by experiments on different real data sets. |