Font Size: a A A

Graph Analytics on Modern Graph Processing System

Posted on:2019-04-17Degree:Ph.DType:Thesis
University:North Carolina State UniversityCandidate:Hong, SeokyongFull Text:PDF
GTID:2470390017489676Subject:Computer Science
Abstract/Summary:
Popularity of graph databases has significantly increased in recent years. With increasing volume and applications of graphs, computational graph analytics has been widely employed for understanding and exploring real-world problems formulated as graphs. In graph analytics, fundamental tasks include graph pattern matching and graph mining. Graph pattern matching provides a way to retrieve interesting subgraphs that match to graph patterns and conditions given by users. Graph mining aims at deriving hidden properties and knowledge embedded in graph databases by using various operations. While these two tasks can be independently performed on graphs, they are frequently used together for solving complex graph problems.;Due to increased popularity of graph databases and maturity in graph analytics, recent years have also witnessed proliferation of several commercial and open source systems for data scientists to accomplish their desired analytical tasks. However, due to lack of standards, many of these systems provide features that are limited in functionality, scalability, and usability, making it difficult for users to choose appropriate system and functionality for building a given application.;In graph pattern matching, recent systems, most commonly, allow users to describe graph patterns in a query/programming interface, which are then optimized and processed by an underlying graph engine. The impact of diversity observed in their interfaces, computational models, and level of optimization has not been well evaluated across different data models in the literature. This makes it complicated for data scientists to understand their advantages and limitations and, in turn, limits the ability of users in choosing a suitable system that meets their expectations.;Another limitation of current graph database systems is that they are hand-optimized for built-in graph mining operations. Though such optimized operations are important for graph analysis, current systems are limited by number of such operations and these optimizations vary from system to system. For example, RDF-based graph systems do not natively support graph mining operations. As a result, graphs are frequently converted and loaded into different systems to apply various graph mining algorithms, which degrade the efficiency of graph analysis.;In order to understand the ecosystem of graph databases and their functionalities, in this thesis, we first evaluated six recent graph processing systems based on two modern graph data models (RDF and property graph) with graph pattern matching workloads. By benchmarking major graph database systems, we discuss their advantages and limitations in both quantitative and qualitative ways. Second, we implemented five important graph mining operations and optimized them in RDF-based systems to augment the functionality of these platforms. Third, we focused on graph clustering analysis and implemented two density-based clustering operations, one that considers non-structural attributes of graph entities represented as RDF, and the other that considers structural properties of graph entities represented as matrix. Finally, we conclude our thesis with a few possible research directions.
Keywords/Search Tags:Graph analytics, Graph databases, Graph entities represented, Graph processing, Modern graph, System, Graph pattern matching, Graph mining
Related items