Design And Implementation Of Highly Efficient Indexing System For Spark-based Remote Sensing Big Data Processing

Posted on:2021-01-06

Degree:Master

Type:Thesis

Country:China

Candidate:J P Xiong

Full Text:PDF

GTID:2392330623465036

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the development of modern remote sensing technology,a massive amount of multi-source heterogeneous remote sensing data are generated.As such,the storage and mining of remote sensing big data become the main points in remote sensing applications,which result in more and more remote sensing data processing systems being migrated to big data and cloud computing environments,such as Google's GEE and ARCGIS ONLINE.Given big data systems with powerful computing capabilities and excellent scalable performance,we can delve into the remote sensing data,give full play to the value of resources,and realize real-time forecasting and monitoring of environment disasters on the ground.The storage of remote sensing big data is mainly for the storage of raw data,the storage of metadata and the indexing system.The system stores the metadata in PostgreSQL,the raw data in HDFS,and then establishes an index system on the original data store to speed up data query.However,due to the different data involved in a query operation,a single indexing system often has different query efficiency when the amount of data is different.In most remote sensing systems,the design and implementation of the indexing system do not take into account the use of the mechanism or components of the system itself,which further affects the efficiency of query and resource use.Therefore,this research proposes a new type of storage system MIGIS.This system integrates indexing algorithms,including quadtree,GeoHash,and Orthogonal List by using HDFS's multi-copy mechanism in data query;and optimizes various index data structures and query mode to increase its query speed.And the corresponding query algorithm is selected under different query numbers,which effectively overcomes the query inefficiency problem of a single index in different query numbers.This system not only has faster query speed,but also can ensure the integrity of the data through the indexing system,connect to the computing system and the outside world through a unified interface,and establish a message queue with Kafka to reduce the service pressure on the server.This implemented a prototype of the proposed remote sensing processing system of MIGIS based on three Spark clusters,which is composed of HDFS,Spark,Kafka,Yarn,Zookeeper,PostgreSQL,Redis and leverged the remote sensing data from the Central Asian Ecology and Environment Research Center of the Chinese Academy of Sciences to evaluate its performance.Our empirical studies show that the query speed under different query requirements can reach twice as high as that based on the traditional indexing technologies.Also,compared with the single indexing technology,the calculation time can be reduced by about 2% depending on the index technology.

Keywords/Search Tags:

Remote Sensing Data, Big Data, Multiple Indexes, Spark

PDF Full Text Request

Related items

1	Research On Remote Sensing Data Distribution And Sharing Strategy Under Multiple Data Source Mode
2	Research On Management And Service Technology Of Remote Sensing Image Big Data Based On GeoTrellis
3	The Design & Implementation Of Data Receiving And Processing System For High Resolution Remote Sensing Image
4	Restoration And Classification Of Remote Sensing Imagery Based On Multiple Feature Learning
5	Research And Implementation Of Ocean Remote Sensing Data Transferring System
6	Research On Multi-source Heterogeneous Remote Sensing Data Acquisition And Remote Sensing Model Processing Technology
7	Research And Application Of Retrieval Strategy For High Resolution Remote Sensing Image Data
8	Design Optimization And Implementation Of Remote Sensing Satellite Ground System Data Receiving Subsystem
9	Management And Visualization Of Remote Sensing Big Data Information Based On WebGIS
10	Research On System Integration Of Multiple Centers Remote Sensing Data Product Collaborative Production