Font Size: a A A

Research On K-means Remote Sensing Image Classification Algorithm Based On Hadoop

Posted on:2018-04-19Degree:MasterType:Thesis
Country:ChinaCandidate:W ChenFull Text:PDF
GTID:2310330518961603Subject:Geodesy and Survey Engineering
Abstract/Summary:PDF Full Text Request
K-Means algorithm is a kind of data mining analysis method,because of its simple and efficient advantages,and in the absence of prior knowledge,it can be get the objects with similar spectral characteristics together,so it has been widely used in remote sensing image classification.However,for the classification of massive high-resolution remote sensing images,the traditional distributed environment requires high performance of hardware infrastructure,and the programming is complex,which has a large application limitations.Due to the advantages of Hadoop distributed system framework,such as high efficiency,high scalability and high fault tolerance,it has been widely used in solving the storage and computing problems of massive data.However,due to the particularity of the remote sensing image data format,the K-means algorithm based on the Hadoop cloud platform is used to classify the remote sensing image data into text files,resulting in excessive distribution of Map and Reduce numbers,spend too much time in network transmission and the problem of insufficient memory allocation.Therefore,how to effectively organize the massive remote sensing data,fast reading and efficient classification become the hot spot in the field of remote sensing.This paper uses the Hadoop cloud platform powerful computing and storage capacity,combined with the GDAL(open source raster spatial database)powerful grid reading ability,design a distributed K-means classification algorithm based on Hadoop cloud platform,under the premise of ensuring the accuracy of classification,improve the classification efficiency of the massive remote sensing image data.The main contents and achievements of this paper include the following aspects:(1)design of remote sensing image input and output format: Hadoop built-in data input and output format can not achieve the transmission of remote sensing image data,this paper inherits Hadoop data input and output format provided by the base class.The input and output formats for remote sensing image data are customized,without destroying the remote sensing image data structure.(2)Propose the Hadoop-based remote sensing image data organization and adopt the corresponding data access method based on Hadoop: Combine the advantages of HDFS and HBase,store the image files in HDFS,store the metadata information in HBase,and adopt the corresponding method of remote sensing image data with specific granularity ofgranularity,and can effectively improve the access efficiency of massive image data on cloud platform.(3)the K-means remote sensing image classification algorithm based on Hadoop is proposed.Due to the problem of K-means algorithm for the initial clustering center selection,similarity criterion and the time complexity,based on the programming model of MapReduce distributed computing framework and the fast read and write capability of GDAL for raster image data.K-means algorithm is implemented on the Hadoop platform of remote sensing image data classification.The Hadoop platform is used to classify the image data with different data sizes.The classification accuracy and platform performance are analyzed.The experimental results show that the classification accuracy of mass remote sensing image data on Hadoop cloud platform is improved compared with that of traditional K-means algorithm,and compared with the result of classification processing on the Hadoop platform using K-means algorithm to converted image data improve the computing power and computational efficiency of the platform.
Keywords/Search Tags:K-means algorithm, Hadoop, MapReduce, GDAL, remote sensing image
PDF Full Text Request
Related items