Research And Implementation Of Seismic Big Data Parallel Processing System Based On Spark

Posted on:2022-02-23

Degree:Master

Type:Thesis

Country:China

Candidate:S Zhang

Full Text:PDF

GTID:2480306548963999

Subject:Naval Architecture and Marine Engineering

Abstract/Summary:

Earthquake is a common natural disaster,which can be monitored and predicted by related technologies.With the rapid development of the Internet,the seismic station system has entered the era of "big data".Every station in every province of our country will produce massive earthquake data every day,The previous seismic platform in the data acquisition,storage,retrieval,calculation has been unable to meet the current needs,therefore,it has important research significance and application value to propose a feasible and effective solution.In this paper,the real environment of stations in Shandong Province and the original seismic platform are taken as the research object,and the big data parallel processing system is studied and implemented,The main research contents are as follows:First,data acquisition based on stream processing.This paper studies the flume component of streaming data acquisition to solve the problem of streaming data acquisition for hundreds of stations and writing corresponding client code in the past,and studies the Kafka message queue to solve the problem of data loss in the past collection process and data accumulation in the peak period of data production.Second,distributed storage and retrieval based on multi database combination.This paper studies the HBase column database and HDFS distributed file system to solve the problem of data sharing between departments in the past,and studies the HBase secondary index based on memory to solve the user’s query demand for non row key fields.Third,parallel computing based on multi language algorithm.This paper studies the Spark distributed parallel computing framework,in order to solve the problem of long computing time consumption of single machine in the past seismic bureau,and studies the scheme of multi language algorithm adapting to the computing engine,in order to solve the problem of unified scheduling of multi language algorithm.Fourth,the realization of the parallel processing system platform of seismic big data.Based on the original seismic platform,the platform is upgraded.In the front of the system,the visual large screen display of the front of the system,so that users can quickly and easily access to earthquake related information.In the background business of the system,the business is divided into five modules: user management,algorithm upload,algorithm application,algorithm calculation and resource monitoring.The system has achieved the expected design goal,and has been applied to the calculation service department of Seismological Bureau.Streaming data acquisition and processing mode not only ensures the speed of user real-time acquisition,but also avoids data loss.The distributed storage and retrieval framework of multi database not only ensures the data sharing between different departments,but also meets the requirements of data retrieval speed.The multi language oriented parallel computing engine realizes the unified parallel scheduling of multi language algorithms,and the computing speed is increased by more than 10 times.Finally,the parallel processing system of seismic big data is designed to meet the requirements of the staff of the Seismological Bureau for algorithm storage,calculation and visualization...

Keywords/Search Tags:

Seismic big data, Distributed storage, Parallel computing, Spark

Related items

1	Research On Reverse Time Migration Data Processing Method Based On Cloud Computing
2	Storage And Parallel Query Technology Research In Distributed Environments Massive Spatial Data
3	Distributed Parallel Computing Environment Of Gml Spatial Data Partitioning Strategy And Algorithm Research
4	The Research For Key Technology Of Astronomy Big Data Integration Based On Spark
5	Parallel Computing Of Spark-based Geospatial Analysis Algorithms
6	Efficient Storage And Parallel Overlay Analysis Of Massive Vector Data In The Cloud Computing Environment
7	A Research On Distributed Logistics Optimization Algorithm Based On Spark
8	Research On Distributed Computing Of Raster Big Data Based On GeoTrellis
9	Research On Parallel Computing And Remote Sensing Data Generation Method For Distributed Hydrological Simulation
10	Research On Distributed Parallel Operation Method Of Terrestrial Carbon Cycle Model