Font Size: a A A

Design And Study On The Model Of Storing And Processing Massive Spatial Data Concurrently And Efficiently

Posted on:2015-06-07Degree:MasterType:Thesis
Country:ChinaCandidate:T HeFull Text:PDF
GTID:2180330473451796Subject:Geographical Cartography and Geographic Information Engineering
Abstract/Summary:PDF Full Text Request
Spatial datasets, with the rapid development of modern geospatial information technology, have risen at a high speed. Considering so complicated and huge datasets, a technological solution is needed urgently to organize, store, process and manage them efficiently. Accordingly, Hadoop provides us with a programming model which can store and access massive data by Parallel Computation. Based on these characteristics of Hadoop, this paper carries out some studies and experiments as follows:(1) Present mainstream methods for spatial data management are summarized. Furthermore, the three core technologies of Hadoop architecture, that is, distributed file system HDFS, parallel programming model MapReduce and distributed database HBase are introduced.(2) By further analyzing HDFS file storage mechanism and vector data structure, a storage schema suitable for storing vector data in HDFS is designed. Also, fundamental vector operations are achieved. Then, an improved way is developed in MapReduce model to solve the issue that traditional ray method is not suitable for judging whether a large number of points are contained in a polygon. Last, a method of managing vector data output by the MapReduce programming model using distributed database HBase is achieved.(3) By comparision and analysis of HDFS’ storage methods for small files, a storage schema is designed on the basis of sequential file technology. The efficiency of storage and reading-writing of vast raster data is improved by applying HBase data tables to manage massive raster data. Furthermore, for the purpose of efficient concurrent image clustering, the K-means clustering algorithm is optimized by MapReduce programming model.(4) A Hadoop computing platform is built up, and some experiments about points being contained in a polygon, file storage and image clustering have been carried out. The experimental results show that the proposed storage schema is capable of managing massive spatial data efficiently.
Keywords/Search Tags:Massive Spatial Data, Hadoop, Cluster, Parallel Computation, K-means
PDF Full Text Request
Related items