Font Size: a A A

Efficient Learning Spatial-temporal Query And Computing Framework For Geographic Flow Data

Posted on:2022-09-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:L S HuFull Text:PDF
GTID:1480306722455424Subject:Remote sensing and geographic information systems
Abstract/Summary:PDF Full Text Request
Geographic flow data describes the interaction of geo-referenced objects.It not only covers the space-time distance information,but also contains the association pattern between geographic locations,and expresses the complex flow pattern in geographic cyberspace.Geographic flow data mining is an important method to understand the pattern and function of geographic system and clarify the dynamic mechanism of geographic system evolution.With the continuous development of data acquisition technology,geographic flow data is growing explosively,forming geographic flow big data.Geographic flow big data provides a solid data foundation for "larger scale,finer granularity and full dimension" to solve the basic problems of geography.However,due to the "super" coverage of the samples of the research object,its "large volume and high dimension" characteristics also pose a huge efficiency challenge to the traditional GIS methods.How to efficiently realize large-scale and high-dimensional geographic flow data mining and analysis has become one of the pain points restricting its rapid development.Therefore,this paper aims at the problem that the existing high-dimensional geographic flow data query methods are limited by the shackles of "index space expands with the expansion of data volume,and index efficiency decays with the expansion of data volume",and the existing spatio-temporal distributed computing framework ignores the attributes of geographic elements,resulting in the inability to effectively support the efficient exploration and analysis of high-dimensional geographic flow data.The research contents of this paper are as follows:(1)Analysis of the influence of geographical flow data distribution on learning index.This paper summarizes the general paradigm of high-dimensional learning spatio-temporal index construction and query,analyzes the core factors affecting the query efficiency of the index model,and provides a research direction for the optimization of learning index.The data-driven index structure has different adaptability to data with different distribution,and the existing research has not systematically explored this problem for learning index.From the perspective of index model accuracy and index migration characteristics,this paper analyzes and verifies the impact of geographic flow data repeatability,data distribution and data volume on index model accuracy and index accurate query range,explores the impact of data feature repeatability on index migration characteristics,and points out a new research direction for learning index optimization.(2)Optimization construction and query of multi-directional learning spatiotemporal index considering geographic flow data distribution.The dimensionality reduction curve method can not effectively express the aggregation characteristics of high-dimensional data,and it is difficult to support the flexible dimension combination range retrieval requirements.Besides,the training loss function of the existing index model can not effectively express the global error range.To solve the above problems,this paper proposes a spatiotemporal multi-directional learning index construction and query method considering data distribution.The index model is constructed separately in each dimension of the data.According to the characteristics that the data repetition rate is inversely proportional to the efficiency of the index model and the degree of uniform data distribution is directly proportional to the efficiency of the index model,the original data space is mapped to the target space to be retrieved by using preaggregation and uniform standardization methods,and the range-loss is used to train the index model of each dimension.In the query process,the single dimension query results are obtained according to the query range of each dimension and cross checked for the final retrieval target.The experimental results show that the spatiotemporal multi-directional learning index proposed in this paper has obvious efficiency advantages compared with the traditional tree index.Under the amount of 5 million data points,the average retrieval time of spatio-temporal range query is only 10% of Quad Tree and 20% of STR*-Tree.(3)A distributed computing framework for large-scale high-dimensional geographic flow data.The existing computing framework of large-scale spatiotemporal data only focuses on the spatio-temporal characteristics of data,ignores the attributes of geographical elements,which is difficult to be directly applied in scenarios of geographic flow data analysis and mining application.To solve this problem,this paper take advantages of distributed parallel computing to design and implement a new computing framework GeoFlow-FEAF for efficient and complex analysis of geographic flow data.The programming interface of the framework has geographic semantic characteristics,adapts to various storage backend of complex geographic flow data,and supports users to build global data partition index and local index structure within each partition on demand.Based on the expansion of online distributed memory engine,an online distributed memory engine supporting users to interactively explore and analyze large-scale geographic flow data is realized.(4)Efficient processing of large-scale geographic flow data based on GeoFlowFEAF.Urban road condition visualization and publishing is a computing intensive process.This paper uses GeoFlow-FEAF to efficiently process massive geographic flow data.It takes the data of China's main roads as an example to verify the efficient generation algorithm of pyramid tile.The experimental results show that,the generation algorithm of map tiles based on GeoFlow-FEAF is efficient and has strong scalability.This paper designs and implements a high-dimensional learning spatio-temporal index considering the distribution of geographic flow data and a distributed parallel computing framework for efficient analysis of large-scale geographic flow data,which provides an effective support for the analysis and application of geographic flow data mining.
Keywords/Search Tags:Geographic flow, Geographic flow big data, Spatio-temporal range query, Learning Spatio-temporal index, Distributed Parallel computing
PDF Full Text Request
Related items