The Key Technologies Of Parallel Spatial Join Based Cloud Computing In GIS

Posted on:2017-10-01

Degree:Doctor

Type:Dissertation

Country:China

Candidate:X Y Zhao

Full Text:PDF

GTID:1360330512954372

Subject:Cartography and Geographic Information Engineering

Abstract/Summary:

PDF Full Text Request

Spatial Join is one of the most important operation in spatial database system of Geographic Information System, which is most time-consuming and complicated and can decide the query efficiency of the spatial database system. With the development of earth observation technology, sensor technology and computer technology, the amount of the spatial data has been growing rapidly. How to efficiently process massive spatial data using sptial join operation becomes one of the key problems in GIS. Cloud computing provides effective support to solve this problem:distributed storage technology can alleviate the pressure of the mass spatial data storage and the parallel computing technology can efficiently complete the geometry algorithms in spatial join which is very complicated and time-consuming. Therefore, the subject that apply the cloud computing technology to spatial join is the current hot research topic and the development direction of the future in the field of GIS.Sptial data partitioning is the foundation of parallel spatial join. The judgement of the join condition is the longest time consuming step in parallel spatial join. Therefore, it is neccesary to make the redundancy among the partitioning results as low as possible, so that the ineffective spatial join operation in the work nodes can be avoided. Additionally, the data amount of each partitioned block should also be as equivalent as possible, so that total number of the tasks processed by different work nodes is same, which will improve the parallel performance of the system. According to the requirement of spatial data partitioning in parallel spatial join, a new data partitioning method, named Two Rounds Map Partitioning Method (TRM) is proposed, which can effectively reduce the redundant data produced in the data partitioning proceduce, while the partitioned block with equivalent amount can be obtained. Then, based on the frame of the MapReduce, a Parallelizing Spatial Join with Multiple Filter based TRM (TRMMFSJ) is proposed, which can effectively improve the efficiency of spatial join with huge data volume. Finally, based on the proposed, an optimized query method is proposed for a special sptial join operation --- Top-k spatial join, which can rapidly extract the most important information such as finding the most congested area and so on.The main content of this paper are as follows:(1) Two Rounds Map Partitioning Method is proposed. TRM has two advantages: 1) Within the first round map, redundant data is reduced by making full use of the space attribute of the partitioned objects, spatial data is evenly partitioned by reasonable setting threshold; 2) Within the second round map, data balance degree is further improved by dynamic mapping mechanism.(2) Parallelizing Spatial Join with Multiple Filter based TRM is proposed. TRMMFSJ has three advantages:1) TRM can get the ideal data partitioning result, which is beneficial to efficient operation of the subsequent parallel spatial join; 2) In the process of parallel spatial join, a multiple filter strategy is proposed. The strategy can effectively reduce the number of objects in the candidate set and the computational resource consumption in the refining step; 3) A duplication avoidance method named Grid Cell Location is proposed. Redundant tasks and ineffective space join operation can completely be eliminated with the method, so that the efficiency of the algorithm can be outstanding improved.(3) An optimized query algorithm applied to Top-k spatial join is proposed. The algorithm improves the efficiency of Top-k spatila join through two optimizations:1) A counter is used to complete the local statistics of overlapping/contains number of spatial objects. Replaces the local statistics result with The spatial objects are replaced with local statistics results as output data so that the resource consumption of data transmission can be reduced; 2) The global statistics and Top-k results retrieval are integrated in one MapReduce job. Multiple MapReduce jobs are not need to launch so that the efficiency of the algorithm is improved.

Keywords/Search Tags:

GIS, cloud computing, spatial data partitioning, parallel spatial join

PDF Full Text Request

Related items

1	The Research And Realization Of The Partitioning Strategy Of Vector Spatial Data In Parallel Computing Environment
2	Cloud Computing Based Storage And Management On Spatial Vector Data
3	The Key Techniques Of Cloud GIS Based On Hadoop
4	Distributed Parallel Computing Environment Of Gml Spatial Data Partitioning Strategy And Algorithm Research
5	The Research And Realization Of The Parallel Spatial Operation In A Simple Feature Model
6	Research On Spatial Large Data Management And High Performance Computing Based On Spatio - Temporal Information Cloud Platform
7	Cloud Computing Environment Gml Spatial Data Query And Spatial Analysis
8	Research On The Technology Of Efficient Mass Spatial Data Storage In The Cloud Computing Environment
9	Research On Cloud Storage And Parallel Spatial Clustering Of Graph Data Under Cloud Computing Environment
10	The Research And Realization On The Spatial Computing Models For Huge Spatial Data