Font Size: a A A

Geospatial Data Management System Based On Hadoop

Posted on:2019-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2370330572955619Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Spatial data has been playing an important role in all walks of life over the years.With the development of modern science and technology,Geographic Information System(GIS)is becoming more and more mature.The development of GIS has subverted the traditional forms of processing and using of geography data.It also has provided great convenience for geographical information collecting,processing and accessing,etc.With the concept of "Internet Plus" and the development of intelligent equipment,a large amounts of devices are constantly producing spatial information.How to process the data in real time and support for its access is particularly important in the era of data explosion.At present,the indexing technology for historical archive space vector data is more mature,but it mainly focuses on historical data.Its index structure is created once for historical vector data and has high query efficiency,but the disadvantage of these index structures is that it cannot guarantee the index efficiency while inserting data dynamically.In order to store the spatial data generated everyday and provide support for its access and analysis,this thesis studies the spatial data storage schema and retrieval strategy based on HBase,and proposes a new index structure which can efficiently access all data while meeting the throughout rate of certain data insertion.Based on Hadoop,we implement the prototype of the spatial data management system named GISdoop using self-built index.The main objectives of this thesis are as follows:(1)The HBase storage schema for the vector data storage is designed,so that the system can store the relevant information of the vector elements.This information consists primarily of geometric information,geometric type information,the ID of the element,and all additional attributes of the element,such as road names for road elements.(2)Against the drawback that the vector index usually only supports historical data,a self-built index is introduced to improve the efficiency of the retrieval of vector elements.In order to cope with constant data generation,the index supports dynamic insertion of vector elements.The index changes with data inserting,and the query efficiency is still relatively high as the amount of data is increased.(3)We apply the index structure to the Hadoop platform and implement the self-built indexing structure through the coprocessor which solves the problem that the Hadoop platform does not support spatial data itself.(4)Improving the efficiency of the range query through the range query algorithm based on self-built index.Give full play to the advantages and characteristics of HBase in coping with large data.(5)A new k nearest neighbor(k NN)query algorithm is designed based on self-built index.The client can efficiently obtain the rough filtered data from the HBase,and then complete fine filtration and obtain the final result.Based on the above design,this thesis implements the self-built index structure in the experimental cluster,and carries out performance tests of insertion efficiency,range query efficiency and k NN query efficiency under different datasets.The experiment shows that the GISdoop system can realize the insertion rate of 70,000 times per second in a small scale cluster.The commonly used spatial range query and k NN query can be completed in hundreds of milliseconds.Therefore,the GISdoop system can satisfy the demand of real-time data inserting and real-time spatial data querying in spatial position related applications.
Keywords/Search Tags:GIS, Vector Data, HBase, Spatial Index, Geohash
PDF Full Text Request
Related items