Font Size: a A A

Design And Implementation Of Highly Efficient Indexing System For Spark-based Remote Sensing Big Data Processing

Posted on:2021-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:J P XiongFull Text:PDF
GTID:2392330623465036Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of modern remote sensing technology,a massive amount of multi-source heterogeneous remote sensing data are generated.As such,the storage and mining of remote sensing big data become the main points in remote sensing applications,which result in more and more remote sensing data processing systems being migrated to big data and cloud computing environments,such as Google's GEE and ARCGIS ONLINE.Given big data systems with powerful computing capabilities and excellent scalable performance,we can delve into the remote sensing data,give full play to the value of resources,and realize real-time forecasting and monitoring of environment disasters on the ground.The storage of remote sensing big data is mainly for the storage of raw data,the storage of metadata and the indexing system.The system stores the metadata in PostgreSQL,the raw data in HDFS,and then establishes an index system on the original data store to speed up data query.However,due to the different data involved in a query operation,a single indexing system often has different query efficiency when the amount of data is different.In most remote sensing systems,the design and implementation of the indexing system do not take into account the use of the mechanism or components of the system itself,which further affects the efficiency of query and resource use.Therefore,this research proposes a new type of storage system MIGIS.This system integrates indexing algorithms,including quadtree,GeoHash,and Orthogonal List by using HDFS's multi-copy mechanism in data query;and optimizes various index data structures and query mode to increase its query speed.And the corresponding query algorithm is selected under different query numbers,which effectively overcomes the query inefficiency problem of a single index in different query numbers.This system not only has faster query speed,but also can ensure the integrity of the data through the indexing system,connect to the computing system and the outside world through a unified interface,and establish a message queue with Kafka to reduce the service pressure on the server.This implemented a prototype of the proposed remote sensing processing system of MIGIS based on three Spark clusters,which is composed of HDFS,Spark,Kafka,Yarn,Zookeeper,PostgreSQL,Redis and leverged the remote sensing data from the Central Asian Ecology and Environment Research Center of the Chinese Academy of Sciences to evaluate its performance.Our empirical studies show that the query speed under different query requirements can reach twice as high as that based on the traditional indexing technologies.Also,compared with the single indexing technology,the calculation time can be reduced by about 2% depending on the index technology.
Keywords/Search Tags:Remote Sensing Data, Big Data, Multiple Indexes, Spark
PDF Full Text Request
Related items