Font Size: a A A

Research On Remote Sensing Image Classification Algorithm Based On Parallel Support Vector Machine And Spark Platform

Posted on:2019-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:L L HuangFull Text:PDF
GTID:2382330566461078Subject:Cartography and Geographic Information System
Abstract/Summary:PDF Full Text Request
Various big data processing technologies such as Hadoop and Spark have been widely used in remote sensing big data processing;machine learning classification methods have also been widely applied to the classification and recognition of remote sensing images.For remote sensing big data,the way in which machine learning classification algorithms can quickly and efficiently classify remote sensing images is an urgent problem.Although there has been research on scheduling implementation of MPI tasks and GPU tasks based on the Spark platform,its task division is a dichotomy mode that divides the task into MPI tasks or GPU tasks without embedding GPU tasks in the MPI process.In order to make full use of the characteristics of MPI's coarse-grained parallel programming model and CUDA's fine-grained parallel programming model,this paper proposes to nest CUDA parallel in MPI multi-processes based on Spark framework,so that it can more effectively improve the processing speed and build a High Performance Support Vector Machine Parallel Computing Framework Based on Spark Framework.Based on the proposed framework,this paper improves the machine learning support vector machine algorithm and implements the high-performance parallel computing of the support vector machine classification algorithm under the platform.Using Landsat 8 remote sensing imagery in Shanghai as experimental data,experiments were conducted under different experimental conditions,and the results were made as follows:1)In a single node,the classification accuracy of MPI-CUDA parallel remote sensing image is reduced by 1.05% compared with single-machine serial,but its calculation speed is increased by 6.3 times.2)In the single-node MPI-CUDA environment,the remote sensing image classification accuracy gradually decreases with the increase of the number of secondary processes,from 94.64%(opening a subprocess)to 92.42%(opening 10 subprocesses);when the number of processes over 8,the classification accuracy of the support vector machine tends to be stable.However,the calculation speed has obviously increased gradually and eventually stabilized(when the number of processes exceeds 9).3)With the increase of the number of nodes,the calculation time of the support vector machine classification algorithm is gradually reduced,and the acceleration ratio gradually increases,but the growth rate is gradually lower;when the number of nodes is 2,4,and 6,the corresponding acceleration ratio is 1.62,2.34,2.65.Although the framework proposed in this paper resulted in a slight decrease in the classification accuracy of remote sensing images,the comprehensive analysis of the classification accuracy and classification time of support vector machines in different environments proves that the proposed high-performance parallel computing framework based on the Spark platform is feasible and reliable.Embedding CUDA parallel in MPI multi-process is an efficient hybrid parallel mode.
Keywords/Search Tags:Spark, MPI, CUDA, Machine Learning, Support Vector Machines, Remote Sensing classification
PDF Full Text Request
Related items