Font Size: a A A

The Methods Of High Efficient Clustering On Spatiotemporal Big Data

Posted on:2019-06-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y H GuFull Text:PDF
GTID:1310330545488228Subject:Cartography and Geographic Information System
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of space-air-ground intergrated stereoscopic observation technology in China,the high-accuracy,high-frequency and large-coverage of the massive geographic spatiotemporal data increased exponentially,which made even more urgent need for geographic spatiotemporal data mining.As one of the most important methods of data mining,clustering has become a hot topic in current academic circle.In the face of massive geographic spatiotemporal data clustering,especially for high-resolution remote sensing images with increasing resolution and spatiotemporal point data with increasing scale,the existing GIS clustering methods lack a unified clustering framework model to express the efficient clustering process of geographic spatiotemporal data.The theory and technology of spatiotemporal data clustering are facing severe challenges.There are many defects for existing common clustering such as lacking unified organization method for multi-source data,losing data association after partition,long time for massive data processing,discontent clustering results and so on.Thus,this paper constructs a high performance parallel clustering framework for geographic spatiotemporal data,including data organization,data warehouse model,data partition and computing paradigm.The representative clustering methods are chosen for high-resolution remote sensing images and spatiotemporal point data correspondingly.By taking the symmetry and spectral similarity in imagery and the macroscopic continuous and microscopic discrete spatiotemporal patterns in spatiotemporal point data into consideration,this paper overcomes the defects existing in the present researches and implements spatiotemporal clustering with the above mentioned parallel clustering framework.The specific research contents are as follows:(1)A high-performance parallel clustering model for geographic spatiotemporal big data,which includes data organization,data storage,data partitioning,and computing paradigms,is designed to uniformly express the parallel clustering process of vector and raster data.At the data organization level,multi-dimensional hypercube data model is proposed by taking the spatiotemporal characteristics of each band,image pixel value and vector data of the raster data as the organizational dimension.In terms of data storage,the hypercube model is further ed into the GeoTable structure of a distributed data warehouse model to achieve unified storage of data;in the data partition level,mathematical expression and spatial computation method of multi-dimensional hypercube is designed based on algebraic theory,and a data partitioning model based on connection structure is established;in terms of parallel computing,the matrix is introducted to describe the interactions of basic operations and communication modes in the job execution calculation and data transmission process,and a paradigm of high-performance parallel computing is established.(2)For raster big data,partition-based clustering algorithms are selected for high resolution remote sensing imagery.The symmetry of ground objects and spectral similarity are taken into consideration,and the similarity measxurement of point symmetry distance is improved.A global optimal search algorithm based on genetic algorithm is put forward.The distributed data structure for high resolution remote sensing imagery based on connection structure is designed,and the efficient raster data clustering algorithm is implemented with the high performance parallel clustering framework.(3)For vector big data,the density-based clustering algorithms are selected for spatiotemporal events.The probability of spatiotemporal event occurrence is fitted by Poisson distribution,and the accessibility of spatiotemporal events is redefined to build the spatiotemporal clustering model.Based on variable time window,an ordered reachable time window distribution algorithm is proposed.The redundant perceptive grid in N dimension space is proposed to implement the data partition model based on connection structure,and the efficient vector data clustering algorithm is realized with the high performance parallel clustering framework.Research and experimental results show that the proposed geographic spatiotemporal big data high performance parallel clustering framework,which combines high-performance computing technology such as parallel computing and distributed computing with spatial computing theory,can effectively implement efficient clustering for large-scale geospatial data.The improvement of clustering algorithm for raster big data can vastly enhance the extraction capability of symmetric ground objects and greatly improve the efficiency of the clustering procedure.The improvement of clustering algorithm for vector big data not only improves the clustering efficiency greatly,but also solves the problem of separation between temporal domain and spatial domain.Furthur more,data sets with muti-densities clusters can be classified more accurately.
Keywords/Search Tags:geographic spatiotemporal big data, data partition, high performance distributed clustering, image segmentation, density-based clustering
PDF Full Text Request
Related items