Research And Implementation Of Subgraph Matching Algorithm Based On Spark

Posted on:2018-09-13

Degree:Master

Type:Thesis

Country:China

Candidate:T Guo

Full Text:PDF

GTID:2310330512993281

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

The graph is a kind of data structure formed on the basis of vertex and edge,and has a very flexible expression ability to describe the relationships between things.With the arrival of big data age,the scale of data is increasing unprecedentedly.Facebook,Twitter,Weibo and other social media has produced a large-scale social data.How to deal with large-scale graph data has become a very critical research problem.The sub-graph matching problem is a very basic and important problem in the field of graph data processing.Also,the search and pattern mining algorithms on the graph must be based on a fast subgraph matching algorithm.The mathematic basis of the sub-graph matching is the classical problem subgraph isomorphism in the graph theory,a well-known NP complete problem.Some of the methods of pure algorithm are given in the previous research work.At present,although some scholars have given some ways to achieve sub-map matching,but these methods can only deal with small-scale graph data.In response to today's large-scale data,the matching efficiency and scalability are very inadequate.In addition,most sub-graph matching algorithms are applied to undirected graphs.And there are problems of inadequacy or inefficiency when dealing with directed graphs.In view of the above problems,this paper relies on the big data platform which has been developed in recent years,and uses it to provide powerful storage and computing power.For the first time,Spark,a big data processing platform,is used as the processing engine to implement a subgraph isomorphism algorithm which can deal with large graph data with GraphX component.The algorithm is based on HDFS as the storage platform,which effectively solves the problem of cluster load balancing.The calculation process is divided into distributed filtering stage and distributed verification stage.In distributed filtering stage,we proposed the vertex signature data structure with taking the characteristics of the Spark platform and the graph segmentation strategy of GraphX with vertices as the split into account.By matching vertex signature parallel,we realized the fast and efficient filtering of the data graph.Besides,the vertex signature expresses itself and neighborhood information,and the parents and children in neighborhood are distinguished,which improves the filtering effect on the directed graph.In the distributed verification phase,the concept of candidate subgraph matching region is proposed by using the Spark platform's distributed computing advantage.Through the calculation of the center vertex of the query graph,a number of candidate subgraph matching regions corresponding to the size of the query graph are obtained in the data graph.The filtered oversized graph data is further segmented to perform efficient subgraph matching operations in the smaller candidate subgraph matching region.Finally,the experiments show that the distributed sub-graph matching algorithm has a good matching efficiency and scalability.In the contrast experiment with the current excellent sub-graph matching algorithm VF2,this algorithm has obvious efficiency advantage.

Keywords/Search Tags:

Spark, subgraph isomorphism, graph mining, parallel algorithm

PDF Full Text Request

Related items

1	Study On Subgraph Isomorphism Algorithm Of DAG
2	Research On Efficient Subgraph Matching Algorithm
3	Motif Detection Algorithm On Network Graph
4	Research On High Performance Algorithms For Constructing The Graphs Without Hexagons
5	Design And Application Of Adaptive Subgraph Matching Algorithm Based On Data Set Characteristics
6	Subgraph Size Related Frequent Approximate Subgraph Mining In Single Graph
7	Study On Steady Frequent Subgraph Mining Algorithm
8	Parallel Matching Algorithms For RDF Graph Type-isomorphism On GPU
9	Research On Subgraph Matching Algorithm Based On Lagre-Scale Graph
10	Research And Implementation Of FPGA-based Multicore Parallel Subgraph Matching Algorithm