Font Size: a A A

A probabilistic approach to spatial ranking for geographic information retrieval

Posted on:2005-03-31Degree:Ph.DType:Thesis
University:University of California, BerkeleyCandidate:Frontiera, Patricia LouiseFull Text:PDF
GTID:2450390008980120Subject:Environmental Sciences
Abstract/Summary:
Effective methods for the discovery of geographic information are an essential prerequisite to the information synthesis that is at the heart of environmental planning. Yet, the process of resource and, consequently, knowledge discovery is hindered by the difficulties in locating the most potentially relevant items. This dissertation presents a novel probabilistic approach to geographic information retrieval that is both practically effective and theoretically sound. The approach employs logistic regression to derive the coefficients of a set of models for statistically inferring the probability of geographic relevance for a document with respect to a query as a function of their spatial characteristics.; In this research, geographic content is spatially represented by a geometric approximation of the bounding extent of the region that geographically references, i.e., georeferences, a query or document. These approximations are used to determine the spatial characteristics of the query-document pair, such as their topological relationship and metric properties. Retrieved items are ranked by the probability estimates with the goal of presenting to the user the most geographically relevant items first.; This research compares the performance of the proposed set of logistic regression models to those of five non-probabilistic ranking methods that compute a spatial similarity score for a query-document pair. All methods are applied to a test collection of queries and documents indexed spatially by two convex conservative geometric approximations: the minimum bounding rectangle (MBB) and the convex hull. In the comparison, the tested logistic regression models outperform, in terms of information retrieval recall and precision measures, all of the non-probabilistic methods. Moreover, statistical tests indicate that the differences in the performances of the two types of methods are statistically significant.; The logistic regression models tested in this research on MBB approximations achieve performance levels similar to those achieved by the use of the non-probabilistic methods on convex hulls. This suggests that probabilistic geographic information retrieval offers an alternative to the use of higher quality spatial representations that are technically difficult to implement. Additionally, this research demonstrates the ability of a probabilistic approach to effectively incorporate information about the geographic context of the query-document pair in the spatial ranking process.
Keywords/Search Tags:Geographic, Information, Probabilistic approach, Spatial, Ranking, Query-document pair, Methods, Logistic regression models
Related items