Font Size: a A A

Parallel Construction Technology Based On The MapReduce VoR-Tree Index

Posted on:2013-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:F L ChenFull Text:PDF
GTID:2260330425950993Subject:Surveying and Mapping project
Abstract/Summary:PDF Full Text Request
With the promotion of global informatization progress, the amount of spatial data is increasingeveryday, traditional distributed spatial database (DDBMS) has been unable to meet productionneeds. After the release of Google MapReduce parallel programming framework, GIS industrystarted to store massive spatial data in the HDFS, but the modeless characteristic of HDFSmaking it complicated when it comes to storage and retrieval on it. The commonly used spatialindex is R-Tree, it is very efficient to find a query point by using its hierarchical structure, but forthe solution of the problem, for example space NN (Nearest Neighbor), the algorithm alwaystraverse hierarchical structure back and forth in order to find the interesting points, whichincreases the system overhead inevitably; the industry put forward a kind of spatial index whichbased on the Voronoi-Diagram against the space NN problem, it can find the nearest neighbor ofthe specific point very fast, but unlike R-Tree, the index will bring larger disk I/O overhead whensearching the data which stored on the disk. In this paper, we combined the advantage of the twospatial index and construct a new index, called VoR-Tree, by using MapReduce parallelprogramming framework, it can not only retrieve the data on disk in a prompt manner, but alsosuitable for the space NN problem since it equips the Voronoi information of the query point.In this paper, we stored spatial data in HDFS and designed parallelization algorithm solvingMaxRNN and kANN problems based on the spatial index structure of VoR-Tree. Under suchcircumstances, MaxRNN problem, searching contiguous areas, should be transformed into theproblem on the intersections of nearest neighbor circle when parallel computing all query points.Then find out the set of intersections with higher weight and constitute contiguous area,completing the computational tasks that otherwise would require several hours within minutes.As for the kANN problem, we abandoned traditional algorithms of DF and BF using traversalR-tree index to evaluate nearest neighbor of given point. Switching to VoR-Tree, we made thesearch scope of nearest neighbor to the minimum by making use of Voronoi diagrams, to lowerthe system overhead significantly. We conducted several experiments on the solving algorithmsof the parallel build of VoR-Tree and parallel execution of Space NN problem under MapReduceframework. And it proved that building VoR-Tree index took longer than building Hilbert-Rtreeor Voronoi diagrams index. In the experiment of solving MaxRNN problem, we found that thereaction-time of client when executing computational tasks declined by approximate linear law ascluster nodes increased. In the parallel execution of kANN, it could complete computational taskswithin a reasonable time when k value is small, however, when k value increased, the algorithmneeds to be optimized. In this paper, based on MapReduce programming framework, we propose a parallel constructmethod for VoR-Tree, and we also propose a parallel algorithm that based on VoR-Tree spatialindex for solving space kANN problem, where we still adopt the traditional MQM algorithm. Theexperiments proved that the VoR-Tree index can meet space query and compute of TB levelspatial data in HDFS, but the index we constructed is a static spatial index, we seldom considereddynamic reconstruction of the index when the new data will be persist in the storage system.
Keywords/Search Tags:Spatial databases, MapReduce, spatial index, clusters, parallel computing
PDF Full Text Request
Related items