Font Size: a A A

Efficient Storage And Parallel Overlay Analysis Of Massive Vector Data In The Cloud Computing Environment

Posted on:2021-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y JiangFull Text:PDF
GTID:2430330620480150Subject:Surveying and mapping engineering
Abstract/Summary:PDF Full Text Request
With the construction of more and more large scientific devices and the development of major scientific experiments,scientific research has entered an unprecedented era of big data.The spatial big data sets generated in the era of big data pose many challenges for the efficient storage and calculation of massive vector data..The traditional solution is to adopt a mode of collaborative management of relational database and Arc SDE.This mode stores data on a single machine,which limits the data storage capacity and computing power to a large extent.The distributed storage and high-performance parallel computing technology provided in the cloud computing environment is an effective solution.Based on this characteristic of cloud computing,this paper studies how to efficiently store and calculate massive vector data in a cloud computing environment.The research focuses on the storage model,index construction,fast data import,fast query,and load balancing technology in parallel geographic computing of massive vector data under the Hadoop cloud platform.Specifically around the following aspects:(1)First,based on the research background and project basis of the project,the paper summarizes the research progress in related fields,namely,geospatial big data storage technology,geospatial analysis algorithm parallelization technology,and geospatial big data The load balancing technology comprehensively analyzes the current research progress and applications of geospatial big data distributed storage and high-performance parallel computing technology in the cloud environment at home and abroad.In addition,the paper gives a detailed overview of the relevant technical theories in the research,and provides basic theory and technical support for the subsequent research of the paper.(2)Secondly,based on the distributed non-relational database HBase under the Hadoop cloud platform,an organization and storage strategy for massive vector data is constructed.Aiming at the characteristics of uneven spatial distribution of vector data,a non-uniform grid segmentation technology is used to design a multi-level grid index,and the Hilbert space filling curve is used to fill the non-uniform grid with Hilbert curve filling.The Hilbert code and layer number of the grid where the object is located are designed to satisfy the Row Key value of the distributed non-relational database HBase storage rule.According to the storage rule characteristics of vector data under HBase,the vector data storage table under this rule is determinedSecondary index table structure.And a vector data parallel import method based on Spark is designed.(3)Again,the load balancing strategy in parallel space analysis under the Hadoop cloud platform is studied,and a vector space data partitioning method considering the complexity of data calculation is proposed.Aiming at the problem that the traditional vector space data partitioning method in parallel space analysis cannot reflect the actual calculation amount and easily lead to data skew,this paper considers the characteristics of vector space data structure and the characteristics of spatial analysis algorithms to study data-intensive and computation-intensive The computational complexity model of the spatial analysis algorithm is used to guide the balanced division of vector space data.By analyzing the principles and characteristics of different algorithm types,the impact index that affects the calculation efficiency of the algorithm is screened to construct a vector space data calculation complexity model.This complexity model designs a vector data partitioning strategy.(4)Finally,based on the Hadoop cloud platform,a massive vector data storage and computing prototype system was designed and developed to realize distributed storage and high-performance computing of massive vector data.Based on the prototype system,the correctness and effectiveness of the vector data parallel import,parallel query,and data partitioning strategies that take into account computational complexity are verified through experiments.
Keywords/Search Tags:cloud computing environment, massive vector data, distributed storage, high-performance spatial analysis, load balancing technology
PDF Full Text Request
Related items