Font Size: a A A

Research On Reachability Indexing For Very Large Graphs

Posted on:2016-12-04Degree:MasterType:Thesis
Country:ChinaCandidate:Z LiFull Text:PDF
GTID:2310330503977888Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Graph which is composed of vertices and edges can be used to depict objects and their relations. Graph has been widely used as data model for many domains, such as Semantic Web, bioinformatics, ontology, and so on. Applications of these areas need to deal with scalability issues on mega-scale data. One of the fundamental graph operations is reachability query. Reachability query asks whether there exists a path between two vertices. Current research focuses on very large graph data with the advent of new applications. Existing methods use index to compress transitive closure and trade-off indexing time, space and query time. In this thesis, we first give vertices a level number after topology sort and then according to the properties of the level number in number theory, we select part of ancestors and descendants of each vertex to construct 2-Hop label. The main contributions are:(1) Give the formal definitions and theoretical basis for the label. We introduce the concept of order on vertices and choose part of order-increasing ancestors and descendants of each vertex to construct 2-Hop label. The correctness of the label is ensured by level number's properties in number theory. In addition, according to the properties in number theory, we put forward an effective method to deal with cross-level edges by adding no more than one dummy vertex for each cross-level edge to maintain the reachability between ancestors and descendants.(2) Design index construction algorithm. By introducing the concept of increasing sequence and decreasing sequence and using their properties in number theory, we propose an efficient index construction algorithm to avoid computing the transitive closure.(3) Optimize the label. We delete dummy vertices to reduce the size of the label and delete high degree vertices based on dummy edges to reduce the size of the label in the construction process.(4) Experiment on large real graphs and large synthetic graphs. We evaluate all methods in terms of the indexing time, space and query time. First we analysis the effect of the optimization. The optimization of deleting dummy vertices will bring some cost of computing, but it can effectively reduce the size of the label. The choice of K in deleting top K high degree vertices based on dummy edges has little effect on performance. We also compare our algorithm with the best existing methods and our algorithm has some advantages in indexing time and query time. Finally, we experiment on large synthetic graphs to assess the scalability.
Keywords/Search Tags:Reachability Query, Graph, Index Algorithm, 2-Hop Labeling
PDF Full Text Request
Related items