Research And Design Of The Distributed File System Focused On Seismic Big Data

Posted on:2015-02-11

Degree:Master

Type:Thesis

Country:China

Candidate:M H Lu

Full Text:PDF

GTID:2250330431450064

Subject:Network Communication System and Control

Abstract/Summary:

PDF Full Text Request

With the rapid development of scientific technology in modern society and the growing popularity of the Internet, people begin to understand this new definition of cloud. With the advent of the cloud era, big data has also attracted people’s attention. The applications of the big data have penetrated into all aspects of our society owing to the considerable progress of information technology. As for the seismic exploration, the amount of data created by seismic exploration has increased greatly in order to satisfy the social needs. Although the growing number of seismic data has well reflected the strong social demands for oil, natural gas and other resources, how to deal with those massive data has also brought a very serious challenge for seismic exploration.There are many aspects for the problems brought by seismic big data, including storage, read, redundancy, extraction etc. This paper mainly focuses on storage and read.In practice, reading the seismic data should take the users’specific circumstances into account in order to satisfy the users’needs, it is generally reflected by the speed and efficiency, and include the features of seismic data at the same time. As for these problems, this paper designs an architecture, which adopts the strategies of distribution and hierarchy. Distribution means distributed storage of seismic data: spread the whole amount of data into several nodes to store separately, and manage these nodes using one master node. Hierarchy means query the data hierarchically:in order to get the data, users should query these data in those nodes hierarchically from master node to storage nodes.As for the actual storage format in seismic data:SEG-Y, this paper makes some improvements based on this format, and then compares the new format with the old format, the results show that the improved format, to some extent, works better than the original one. Besides, based on the architecture proposed before, the paper adds two-level index structure into the architecture. In this case, users can quickly find out the specific location of data by querying the index and carry out reading operation, thus promising the speed and efficiency of what the users want. These are some related implementation details based on the strategies of distribution and hierarchy, and also the innovations of this paper.Above all the several researches which have been discussed before, this paper uses two kinds of distributed file systems to carry out researches:Fast DFS and Hadoop DFS. Based on the architecture using distribution and hierarchy strategies, combing the characteristics of the seismic data and the actual needs, this paper put all these elements into this two distributed file systems, to make them more capable of dealing with problems in the field of seismic exploration. What is more, this paper carries out several experiments to test the file operations by these new file systems, and also make some comparisons between these new file systems and the original ones. All the results show that the new distributed file systems created by this paper are more suitable for dealing with seismic big data, with a better reading speed, and the overall process can be very efficient at the same time. What is more, since there are some advantages when operating seismic big data using the new distributed file systems created by this paper, together with easy to operate and user-friendly at the same time, the new systems can have an extensive application prospect for seismic exploration.

Keywords/Search Tags:

massive data, distribution and hierarchy, small files, SEG-Y format, two-level index, Fast DFS, Hadoop DFS

PDF Full Text Request

Related items

1	Construction And Application Of Massive GNSS Files Cloud Storage Based On Hadoop
2	The Research Of Multi-format Seismic Data Access And Conversion Technology
3	Massive Spatial Data Storage And Management Based On Hadoop
4	Research On Small Files Storage Performance Optimization Based On Time Series Prediction
5	Design And Study On The Model Of Storing And Processing Massive Spatial Data Concurrently And Efficiently
6	Research On Vector Data Rendering Technology Based On Hadoop And Mapnik
7	Design And Implementation Of GIS Visual Analysis Platform Based On Hadoop
8	Research And Implementation Of Massive Terrain Data File System
9	Optimization Of Large Scale Of Files Transfer In Meteorological Grid
10	Iterative Divide-and-conquer Method Of Estimating Index Coefficients In Single-index Model Under Massive Data