Font Size: a A A

Research On Physical Marine Big Data Cloud Computing Technology Based On Spark

Posted on:2019-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z YangFull Text:PDF
GTID:2370330578471911Subject:Surveying and mapping engineering
Abstract/Summary:PDF Full Text Request
In recent years,emerging technologies such as big data and cloud computing have developed rapidly and have been widely used in e-commerce,education,medical care,and transportation.Cloud computing can provide users with reliable,customized,and maximized resource utilization services,with secure data storage,convenient Internet services,and powerful computing capabilities.At present,China is making great efforts to develop oceanic cause.With the continuous improvement and optimization of marine exploration technology,marine data shows characteristics such as massiveness,complexity,and diversification,bringing tremendous challenge to data management,utilization,and marine knowledge mining.This paper studies the cloud computing technology based on Hadoop and Spark,designs a cloud storage and processing scheme for physical marine data,and applies this scheme to the statistical analysis of data.Compared with the traditional file server processing mode,such as high configuration cost,low processing efficiency and complicated programming model,the cloud computing technology based on Hadoop and Spark has obvious advantages in distributed data storage and parallel computing.For the distributed storage of marine big data,the paper used HDFS as the underlying storage framework,and conducted an in-depth study of the overall architecture of HDFS and how data was written,segmented,backed up,and restored,and compared it to the local file system.As the analytical processing of marine big data,the paper adopts a combination of Spark and Yarn for parallel framework design,and builds a NetCDF distributed data set based on RDD,and performs parallel processing on a large range of marine data by rewriting the data reading interface.The paper researched and designed the system performance optimization scheme in Yarn cluster mode,and adjusted the HDFS data block size,Spark application submission parameters and Yarn resource allocation parameters.Finally,the thesis summarizes the steps to build the Hadoop cluster and the Spark cluster environment,and conducts a comparative analysis of the efficiency of inquery and statistics for the 40-year wave data in the East China Sea area.The experiment shows that compared to the stand-alone model,the cloud computing model has absolute advantages in the processing of a large amount of data.The paper applies the cloud computing technology based on Hadoop and Spark to the statistical analysis of effective wave-high--zero-cycle sc-atter diagrams.The statistics of scatter diagram use traditional definitions to achieve effective wave-cross zero period joint distribution statistics,and analyze seasonal and geographical changes in effective wave height;seco-ndly,use the extreme I-distribution function(the parameter method is the Gumbel method).The related theory of the wave height cycle joint distrib-ution function proposed by Ochi is used to calculate the effective wave h-eight extremum and cycle expectation value at different return periods.
Keywords/Search Tags:Hadoop, Yarn, Spark, NetCDF, Wave height-period joint distribution
PDF Full Text Request
Related items