Font Size: a A A

The Key Technologies Of Parallel Spatial Join Based Cloud Computing In GIS

Posted on:2017-10-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Y ZhaoFull Text:PDF
GTID:1360330512954372Subject:Cartography and Geographic Information Engineering
Abstract/Summary:PDF Full Text Request
Spatial Join is one of the most important operation in spatial database system of Geographic Information System, which is most time-consuming and complicated and can decide the query efficiency of the spatial database system. With the development of earth observation technology, sensor technology and computer technology, the amount of the spatial data has been growing rapidly. How to efficiently process massive spatial data using sptial join operation becomes one of the key problems in GIS. Cloud computing provides effective support to solve this problem:distributed storage technology can alleviate the pressure of the mass spatial data storage and the parallel computing technology can efficiently complete the geometry algorithms in spatial join which is very complicated and time-consuming. Therefore, the subject that apply the cloud computing technology to spatial join is the current hot research topic and the development direction of the future in the field of GIS.Sptial data partitioning is the foundation of parallel spatial join. The judgement of the join condition is the longest time consuming step in parallel spatial join. Therefore, it is neccesary to make the redundancy among the partitioning results as low as possible, so that the ineffective spatial join operation in the work nodes can be avoided. Additionally, the data amount of each partitioned block should also be as equivalent as possible, so that total number of the tasks processed by different work nodes is same, which will improve the parallel performance of the system. According to the requirement of spatial data partitioning in parallel spatial join, a new data partitioning method, named Two Rounds Map Partitioning Method (TRM) is proposed, which can effectively reduce the redundant data produced in the data partitioning proceduce, while the partitioned block with equivalent amount can be obtained. Then, based on the frame of the MapReduce, a Parallelizing Spatial Join with Multiple Filter based TRM (TRMMFSJ) is proposed, which can effectively improve the efficiency of spatial join with huge data volume. Finally, based on the proposed, an optimized query method is proposed for a special sptial join operation --- Top-k spatial join, which can rapidly extract the most important information such as finding the most congested area and so on.The main content of this paper are as follows:(1) Two Rounds Map Partitioning Method is proposed. TRM has two advantages: 1) Within the first round map, redundant data is reduced by making full use of the space attribute of the partitioned objects, spatial data is evenly partitioned by reasonable setting threshold; 2) Within the second round map, data balance degree is further improved by dynamic mapping mechanism.(2) Parallelizing Spatial Join with Multiple Filter based TRM is proposed. TRMMFSJ has three advantages:1) TRM can get the ideal data partitioning result, which is beneficial to efficient operation of the subsequent parallel spatial join; 2) In the process of parallel spatial join, a multiple filter strategy is proposed. The strategy can effectively reduce the number of objects in the candidate set and the computational resource consumption in the refining step; 3) A duplication avoidance method named Grid Cell Location is proposed. Redundant tasks and ineffective space join operation can completely be eliminated with the method, so that the efficiency of the algorithm can be outstanding improved.(3) An optimized query algorithm applied to Top-k spatial join is proposed. The algorithm improves the efficiency of Top-k spatila join through two optimizations:1) A counter is used to complete the local statistics of overlapping/contains number of spatial objects. Replaces the local statistics result with The spatial objects are replaced with local statistics results as output data so that the resource consumption of data transmission can be reduced; 2) The global statistics and Top-k results retrieval are integrated in one MapReduce job. Multiple MapReduce jobs are not need to launch so that the efficiency of the algorithm is improved.
Keywords/Search Tags:GIS, cloud computing, spatial data partitioning, parallel spatial join
PDF Full Text Request
Related items