Font Size: a A A

Targeted Graph Mining for Efficient User-Relevant Knowledge Discover

Posted on:2018-03-29Degree:Ph.DType:Thesis
University:North Carolina State UniversityCandidate:Harenberg, Steven DouglasFull Text:PDF
GTID:2448390005451691Subject:Computer Science
Abstract/Summary:PDF Full Text Request
Mining for user-relevant knowledge from large-scale graph data is at the core of graph data analytics in applications spanning many domains, such as biomedical informatics, social network analysis, and climate science. These applications can often benefit from a targeted enumeration of specific graph substructures, where the output is constrained or influenced by user preferences. For example, a user may only be interested in substructures containing genes known to be associated with Alzheimer's disease. These genes could be viewed as the user's knowledge priors or query vertices.;Targeted graph mining, as introduced in this thesis, aims to help prevent enumerating undesirable or irrelevant patterns and aims to generate output that better relates to the user's preference. Moreover, it can reduce both the computational requirements of the enumeration process as well as the human effort involved in analyzing the results. For example, even if the number of enumerated substructures grows linearly with the graph size, it becomes impossible to manually inspect each substructure for value added to the application knowledge base. Therefore, we posit that targeted algorithms incorporating knowledge priors and user preferences can help facilitate an exploratory and interactive approach towards mining user-relevant knowledge from graph data.;In this dissertation, we explore targeted graph mining algorithms in the context of clique and community detection. In our first component, we propose two memory efficient algorithms for enumerating cliques that are enriched by user-specified knowledge priors. These complementary algorithms consist of a top-down precomputation and index-based method and a bottom-up method. We evaluate the efficiency of our algorithms on a number of real-world networks and demonstrate the practical value by mining a protein interaction network with Alzheimer's Disease biomarkers as the knowledge priors.;In our second component, we develop a hybrid approach to the complementary top-down and bottom-up algorithms. With our approach, we are able to develop a synergy between successive queries over the same graph by designing a state space indexing and querying strategy. By dynamically indexing the constituent state space generated with each query, this strategy is able to reduce redundant computations from similar queries and promote targeted exploration of user-relevant substructures. Experimental results over real-world networks demonstrate our strategy's effectiveness at reducing cumulative query-response time.;Finally, we shift our focus towards enabling a framework for mining user-relevant substructures across an ensemble of graphs, branching into the abundance of community definitions and methods. Our third dissertation component, a comprehensive survey and evaluation on the recent advances in this body of literature, has motivated the idea that developing this framework to be adaptable to the user's desired community detection method is important, as each method may have strengths and weaknesses for different applications. Therefore, in our final component, we develop a top-down framework for discovering persistent and discriminative communities in graph ensembles, while also supporting queries over user-relevant vertices and graphs of interest.
Keywords/Search Tags:Graph, User-relevant, Mining, Over, Knowledge priors
PDF Full Text Request
Related items