Font Size: a A A

Studies On Query Processing Techniques Over Uncertain Graph Data

Posted on:2012-04-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y YuanFull Text:PDF
GTID:1220330467482674Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Graph is a general data structure and has been widely used in science and engineering fields, such as computational chemistry, bioinformatics, fluid dynamics, social relations and the World Wide Web. However, there exist noise and errors in modern scientific research methods and measurement techniques, which leads to uncertain graphs, such as uncertain social network, uncertain road network, uncertain protein-protein interaction network and uncertain ontology network. In recent years, uncertain data management has been one of hot topics, and its basic model,"possible worlds", lays a foundation for modeling uncertain graphs, which provides a rich semantics for a wide of applications. Techniques for certain graph data management and uncertain data management cannot be directly used to handle uncertain graph data, and thus new techniques are needed to manage uncertain graph data.Query processing over certain graph data is complicated or even NP-hard, for example, subgraph isomorphism test. On the other hand, the number of "possible worlds" grows in exponential. Therefore, it is more difficult to manage a combination of certain graph data and possible worlds, that is, uncertain graph data.For "shortest path quires over uncertain graphs", we adopt possible world semantics to model uncertain graph data, and based on the model, a novel definition-probabilistic shortest path (P-SP) query is proposed. Firstly, to avoid enumerating all possible worlds, we propose a basic algorithm to exactly compute P-SP query. To further speed up the basic algorithm, we develop three improved approaches based on u-event bounds, including isomorphic graph reduction, disjoint path/cut set bounds, and correlative bounds. Finally, extensive experimental results demonstrate the effectiveness of the proposed algorithms.For "reachability queries over uncertain graphs", we uses real life applications to demonstrate that probabilistic reachability (PR) queries are widely used in biological networks, social networks, ontology networks and RDF networks. Specifically, we study a PR query using the possible world semantics. Though it is proved that processing PR query is a#P-complete problem, we first propose a basic random algorithm to efficiently estimate the reachable probability with a high quality. To further improve the basic method, we introduce conditional distribution in random algorithm called conditional random algorithm (CRA). CRA can process the queries in polynomial time by adopting the disjoint path set and cut set probabilities, subgraph structure probabilities for the conditional distribution. The effectiveness of the proposed solutions for PR queries has been verified through extensive experiments on real uncertain graph datasets.For "subgraph queries over uncertain graphs", we model probabilistic subgraph queries considering conditional probability distribution between nodes and edges under possible world semantics. Though it is proved that the query has#P-complete complexity, we employ a filtering-and-verification framework to speed up the search efficiency. In the filtering phase, we use a probabilistic inverted index, PIndex, composed of subgraph features obtained by an optimal feature selection process. Using this process, PIndex sorts out a large number of uncertain graphs and maximizes the pruning capability. During the verification phase, we develop bound algorithms and a random algorithm based on time evolution to validate the remaining candidates.For "pattern match queries over uncertain planar graphs", we model uncertain pattern match (UPM) queries under the probabilistic semantics. Firstly, to avoid enumerating all possible worlds, we propose a basic deterministic algorithm to exactly compute UPM query. To further speed up the basic algorithm, we develop an improved deterministic approach based on tighten bounds. Next, we design a sampling algorithm to quickly and accurately estimate matching probability. The effectiveness of the proposed solutions for UPM queries has been verified through extensive experiments on real uncertain planar graph datasets.
Keywords/Search Tags:uncertain graph, possible worlds, shortest path query, reachability query, subgraph query, pattern match query
PDF Full Text Request
Related items