Font Size: a A A

Research On Cloud Storage And Parallel Spatial Clustering Of Graph Data Under Cloud Computing Environment

Posted on:2014-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:J F LinFull Text:PDF
GTID:2180330461472557Subject:Cartography and Geographic Information Engineering
Abstract/Summary:PDF Full Text Request
In recent years, how to implement an efficient storage management on massive geo-spatial data and ulteriorly web service for a broad variety of users, has becomes an increasingly hot issue in the field of geographical information science. Researches about spatial data cloud storage are mostly concentrated in storage and management of raster data, but those for vector data are fragmentary. Meanwhile, studies about cloud storage-based spatial mining are also lack of systematic research.A cloud storage system.to provide distributed cloud-enabled storage management and services for massive geo-spatial data with integrity of both vector and raster formats is proposed in this thesis in the light of their intrinsic differences. Based on cloud storage, an efficient and parallel spectral clustering mining algorithm was designed and implemented.Content and contribution of this thesis are presented as follow:1. Based on an overview on the research of Cloud computing, NoSQL and Graph computing both home and abroad, basic theory of spatial cloud storage, spatial cloud service and parallel clustering were expounded. Graph model expression of spatial data was studied, and implement of traditional spatial storage technology and its limitation was analyzed too. Principle, applicable occasion, as well as merit and demerit of different parallel computing models were also point out in the paper.2. Acloud storage.system to provide distributed cloud-enabled storage management and services for massive geo-spatial data with integrity of both vector and raster formats is proposed in the light of their intrinsic differences. Based on three-tier layer architecture,we put forward its implementation strategy and method of cloud storage management for raster and vector data respectively based on NoSQL database system, followed by a universal data access interface. In our research, using the distributed file system-HDFS and the column family database-HBase as a container to store massive raster data with a distributed space index technique, and the distributed graph database system-Neo4J is used to store massive vector data in view of the constraints of ACID with an R-tree space index.3. DiDiC segmentation algorithm is accepted and applied to parallel graphs data clustering based on depth analysis of spectral clustering principle and comparation of different sub-graph partition algorithm. On the basis of MapReduce framework, a parallel spectral clustering algorithm was proposed.4. We carried out two elaborate comparisons which including graphs data storage and parallel clustering. For the graphs data storage, we run a test between GeoDAC and open source GIS software-PostGIS to validate vector data reading & writing performance. The preliminary results indicated that, although GeoDAC has no accelerated write performance than PostGIS, but it gains significant powerful reading or spatial query performance than PostGIS. On the Graph mining aspect, we measured parallel spectral clustering algorithm with the stand-alone version of in Figure data mining efficiency comparison test, the test result shows that the parallelization of algorithm has significantly improved graphical data mining algorithm performance.
Keywords/Search Tags:cloud storage, NoSQL, graph database, spatial clustering, spectral clustering, parallel algorithms
PDF Full Text Request
Related items