Design And Study On The Model Of Storing And Processing Massive Spatial Data Concurrently And Efficiently

Posted on:2015-06-07

Degree:Master

Type:Thesis

Country:China

Candidate:T He

Full Text:PDF

GTID:2180330473451796

Subject:Geographical Cartography and Geographic Information Engineering

Abstract/Summary:

PDF Full Text Request

Spatial datasets, with the rapid development of modern geospatial information technology, have risen at a high speed. Considering so complicated and huge datasets, a technological solution is needed urgently to organize, store, process and manage them efficiently. Accordingly, Hadoop provides us with a programming model which can store and access massive data by Parallel Computation. Based on these characteristics of Hadoop, this paper carries out some studies and experiments as follows:(1) Present mainstream methods for spatial data management are summarized. Furthermore, the three core technologies of Hadoop architecture, that is, distributed file system HDFS, parallel programming model MapReduce and distributed database HBase are introduced.(2) By further analyzing HDFS file storage mechanism and vector data structure, a storage schema suitable for storing vector data in HDFS is designed. Also, fundamental vector operations are achieved. Then, an improved way is developed in MapReduce model to solve the issue that traditional ray method is not suitable for judging whether a large number of points are contained in a polygon. Last, a method of managing vector data output by the MapReduce programming model using distributed database HBase is achieved.(3) By comparision and analysis of HDFS’ storage methods for small files, a storage schema is designed on the basis of sequential file technology. The efficiency of storage and reading-writing of vast raster data is improved by applying HBase data tables to manage massive raster data. Furthermore, for the purpose of efficient concurrent image clustering, the K-means clustering algorithm is optimized by MapReduce programming model.(4) A Hadoop computing platform is built up, and some experiments about points being contained in a polygon, file storage and image clustering have been carried out. The experimental results show that the proposed storage schema is capable of managing massive spatial data efficiently.

Keywords/Search Tags:

Massive Spatial Data, Hadoop, Cluster, Parallel Computation, K-means

PDF Full Text Request

Related items

1	Integration And Development Of Natural Resource Spatial Data Application Platform Based On Hadoop
2	Pc Cluster-based Parallel Processing And Visualization For Massive Mine Spatial Data
3	PC Cluster-Based Parallel Processing And Visualization For Massive Mine Spatial Data
4	Massive Spatial Data Storage And Management Based On Hadoop
5	Storage And Parallel Query Technology Research In Distributed Environments Massive Spatial Data
6	The Key Techniques Of Cloud GIS Based On Hadoop
7	A Research Of DGTD Algorithm And The Related Parallel Computation Technologies
8	Research On The Thunderstorm Data Clustering And Thunderstorm Prediction Model Based On The Hadoop Platform
9	Research On Processing Method For Seismic Waveform Data Based On Hadoop Platform
10	Research On VGI Vector Spatial Data Management Method Based On Hadoop