Font Size: a A A

Research On Outlier Detection Algorithms For Spatial Data

Posted on:2013-12-05Degree:MasterType:Thesis
Country:ChinaCandidate:J E DingFull Text:PDF
GTID:2248330395452739Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As the booming of spatial database and Geographic Information System, tremendous spatial data has been generated. How to store, query and mine the data which contains spatial information has become one of the research focuses of computer science and geographic information disciplines. Spatial data is different to traditional data for data mining. Spatial data not only contains spatial attributes but also embodies non-spatial attributes, so it shows autocorrelation and heterogeneousness constrains. Now, many outlier detection algorithms for spatial data mainly use methods which have been designed for traditional data to tackle spatial outlier mining problems and the results are often not satisfying. This thesis mainly focuses on outlier detection for spatial dataset. It states how to effectively design spatial outlier detecting algorithms which consider both autocorrelation and heterogeneousness constrains. The main contributions are archived as follows.(1) The process of spatial outlier detecting mainly contains two steps. The first step is to define and compute spatial neighbor relationship between different spatial objects. The second step is to design a dissimilarity formulation and calculate the value of dissimilarity for each spatial object. Between the steps above, computing spatial neighbor relationship comes the root of spatial outlier detection algorithms and it makes determined impact for calculating of dissimilarity. The algorithm based on Delaunay triangulation theory is proposed to build spatial neighborhood relationship for each spatial object. A comparison has been made between algorithms DTNB and Boundary Touch on TIGER2008real dataset from reasonableness and effectiveness aspects. Besides, a dissimilarity formulation SLOI has been designed and a new spatial outlier detection algorithm DT-SLOI has been given which is the combination of DTNB and SLOI. DT-SLOI algorithm has been verified on synthetic dataset and American Demography real dataset us-census2000respectively and also compared with existing spatial outlier algorithm. The result of experiment demonstrates that DT-SLOI algorithm is superb to existing spatial outlier detecting algorithms which use boundary touch to get spatial neighborhood relationship from correctness and robustness.(2) Euclidean distance was used directly for existing spatial outlier detecting algorithms to calculate the distance between vectors of two spatial object’s non-spatial attributes but not consider the internal relationships among each spatial object’s non-spatial attributes. As a result, those methods can not reflect the heterogeneousness comprehensively of spatial data. The thesis gives the analysis of relationships among non-spatial attributes of each spatial object. It divides the non-spatial attributes into base attributes and interest attributes according the target of spatial outlier detection. CAP (Categorized Attribute Parameter) has been proposed to quantify the relationships between attributes of each spatial object. So a dissimilarity of spatial object’s non-spatial attributes can be gotten from horizontal and vertical perspective. This method has been verified on American Demography dataset us census2000dataset and compared with method which use Euclidean distance directly to calculate the dissimilarity between non-spatial vectors. The result indicates that CAP-SOF algorithm can clearly reflect the heterogeneousness of spatial data and the spatial outlier detecting result have more interpretability.
Keywords/Search Tags:spatial data, spatial outlier detection, spatial neighborhood, non-spatialattributes dissimilarity
PDF Full Text Request
Related items