Parallel Construction Technology Based On The MapReduce VoR-Tree Index

Posted on:2013-02-15

Degree:Master

Type:Thesis

Country:China

Candidate:F L Chen

Full Text:PDF

GTID:2260330425950993

Subject:Surveying and Mapping project

Abstract/Summary:

PDF Full Text Request

With the promotion of global informatization progress, the amount of spatial data is increasingeveryday, traditional distributed spatial database (DDBMS) has been unable to meet productionneeds. After the release of Google MapReduce parallel programming framework, GIS industrystarted to store massive spatial data in the HDFS, but the modeless characteristic of HDFSmaking it complicated when it comes to storage and retrieval on it. The commonly used spatialindex is R-Tree, it is very efficient to find a query point by using its hierarchical structure, but forthe solution of the problem, for example space NN (Nearest Neighbor), the algorithm alwaystraverse hierarchical structure back and forth in order to find the interesting points, whichincreases the system overhead inevitably; the industry put forward a kind of spatial index whichbased on the Voronoi-Diagram against the space NN problem, it can find the nearest neighbor ofthe specific point very fast, but unlike R-Tree, the index will bring larger disk I/O overhead whensearching the data which stored on the disk. In this paper, we combined the advantage of the twospatial index and construct a new index, called VoR-Tree, by using MapReduce parallelprogramming framework, it can not only retrieve the data on disk in a prompt manner, but alsosuitable for the space NN problem since it equips the Voronoi information of the query point.In this paper, we stored spatial data in HDFS and designed parallelization algorithm solvingMaxRNN and kANN problems based on the spatial index structure of VoR-Tree. Under suchcircumstances, MaxRNN problem, searching contiguous areas, should be transformed into theproblem on the intersections of nearest neighbor circle when parallel computing all query points.Then find out the set of intersections with higher weight and constitute contiguous area,completing the computational tasks that otherwise would require several hours within minutes.As for the kANN problem, we abandoned traditional algorithms of DF and BF using traversalR-tree index to evaluate nearest neighbor of given point. Switching to VoR-Tree, we made thesearch scope of nearest neighbor to the minimum by making use of Voronoi diagrams, to lowerthe system overhead significantly. We conducted several experiments on the solving algorithmsof the parallel build of VoR-Tree and parallel execution of Space NN problem under MapReduceframework. And it proved that building VoR-Tree index took longer than building Hilbert-Rtreeor Voronoi diagrams index. In the experiment of solving MaxRNN problem, we found that thereaction-time of client when executing computational tasks declined by approximate linear law ascluster nodes increased. In the parallel execution of kANN, it could complete computational taskswithin a reasonable time when k value is small, however, when k value increased, the algorithmneeds to be optimized. In this paper, based on MapReduce programming framework, we propose a parallel constructmethod for VoR-Tree, and we also propose a parallel algorithm that based on VoR-Tree spatialindex for solving space kANN problem, where we still adopt the traditional MQM algorithm. Theexperiments proved that the VoR-Tree index can meet space query and compute of TB levelspatial data in HDFS, but the index we constructed is a static spatial index, we seldom considereddynamic reconstruction of the index when the new data will be persist in the storage system.

Keywords/Search Tags:

Spatial databases, MapReduce, spatial index, clusters, parallel computing

PDF Full Text Request

Related items

1	The Parallel Computing Research On High-performance Spatial Analysis Under Cpu/Gpu Heterogeneous Environment
2	Storage And Parallel Query Technology Research In Distributed Environments Massive Spatial Data
3	The Research And Realization Of The Partitioning Strategy Of Vector Spatial Data In Parallel Computing Environment
4	Research On Parallelization Of Large Scale Raster Data Spatial Analysis Algorithm Based On MapRedcue
5	Studying On The Technolgies Of Storage And Processing Of Spatial Vector Data In High-Performance Parallel GIS
6	Research Of Support Vector Regression Modeling Based On Spatial Analysis And Parallel Computing
7	The Research And Realization Of The Parallel Spatial Operation In A Simple Feature Model
8	The Research And Realization On The Spatial Computing Models For Huge Spatial Data
9	Research On Parallel Computing Technology Of CME Parameters Detection Model Based On MapReduce
10	The Key Technologies Of Parallel Spatial Join Based Cloud Computing In GIS