Font Size: a A A

Research On Cloud Storage Of Vector Spatial Data And Parallelization Of MCL Algorithm

Posted on:2015-12-31Degree:MasterType:Thesis
Country:ChinaCandidate:D L LeiFull Text:PDF
GTID:2180330461973602Subject:Cartography and Geographic Information System
Abstract/Summary:PDF Full Text Request
With the rapidly growing volume and complexity of spatial data, efficient storage and processing of massive geo-spatial data has become an urgent research problem in GIScience and related fields. Vector spatial data storage and processing is particularly challenging due to its complexity in data representation, access and analysis.How to implement an efficient storage management on massive geo-spatial data and discover usable and interesting knowledge, has becomes an increasingly hot issue in the field of geographical information science. At the same time, the development of cloud computing and cloud storage technology (such as MongoDB, HBase, Hadoop) give the vector geospatical data a new way for storage and process.1.Based on the diversity of vector geo-spatial data in formats and attributes/we discuss some questions about vector geo-spatial data storage, such as multi-user storage and management, heterogeneous vector data storage, huge amounts of vector data computing. In this paper, we present an approach and a system for cloud-based vector data storage and analysis. Our system extends MongoDB and integrates the Hadoop framework for parallel spatial data processing and analysis. With a three-layer browser-server architecture, the system consists of a suite of modules for data storing, conversion, query, and analysis. The OGR Simple Features Library is integrated to perform data conversion between MongoDB and various formats of vector spatial data. We use the MongoDB Connector for Hadoop (mongo-Hadoop) to transfer data between the MongoDB and a Hadoop MapReduce model.Content and contribution of this thesis are presented as follows:1.The research status of the theory and technology on the vector spatial data storage and parallel clustering are summarized. The characteristics of NoSQL database are discussed and the characteristics and application scenarios of MongoDB database are expounded. At the same time, the theory and technology on MapReduce model, classical clustering algorithm and graph clustering algorithm are introduced.2.Based on the diversity of vector geo-spatial data in formats and attributes,the thesis discusses some questions on vector geo-spatial data storage, such as multi-user storage and management, heterogeneous vector data storage, huge amounts of vector data computing. In this thesis, an approach and a system for cloud-based vector data storage and analysis is presented. This system extends MongoDB and integrates the Hadoop framework for parallel spatial data processing and analysis. With a three-layer browser-server architecture, the system consists of a suite of modules for data storing, conversion, query, and analysis. The OGR Simple Features Library is integrated to perform data conversion between MongoDB and various formats of vector spatial data. The system use the MongoDB Connector for Hadoop (mongo-hadoop) to transfer data between the MongoDB and a Hadoop MapReduce model.3.The optimization methods of graph algorithm designing in MapReduce framework are discussed. According to the complexity of the topological relationship of graph structure, this thesis discusses the optimization method of graphs algorithm designing in MapReduce framework:message passing mechanism, local aggregation, aggregation in Mapper.Based on the optimization methods in MapReduce framework, a parallel algorithm is designed based on MCL algorithm and MapReduce parallel framework.4.An experiment is carried out using five physical servers to compare the performance of VectorDB with PostGIS on vector data reading, writing, and query. Preliminary results indicate that, although VectorDB is slightly slower for data writing, it gains significant advantage on data access and spatial query than PostGIS. This thesis also compares VectorDB and MongoDB on massive vector data processing. Results show that VectorDB has a better performance than MongoDB on massive vector data processing. In the testing experiment of parallel MCL algorithm on scalability and efficiency, the result show that the parallel MCL algorithm has a good performance.
Keywords/Search Tags:Cloud Storage, MongoDB, Vector spatial data, MCL, MapReduce
PDF Full Text Request
Related items