Font Size: a A A

Research On Data Modeling And Indexing Technology For Marine Environment Monitoring Data

Posted on:2017-05-15Degree:MasterType:Thesis
Country:ChinaCandidate:L SunFull Text:PDF
GTID:2180330509956416Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years, the rapid development of information technology, especially the Internet, networking and social networking technologies, has brought on the fast growth on data volume for various industries. Enterprise has a 50% increase in data amount annually. Under conservative estimates, at least 1.5 billion TB of data records is generated each year all over the world. Hence, study on industry big data has become a hot research.Marine development strategy is becoming more and more serious in the national strategy, while the various types of applications in marine field have great significance in promoting the development of marine information. The development and wide-spread layout of Marine environment monitoring instrument and equipment, including: buoys, satellites, remote sensing, sensors, monitoring stations and other real-time data acquisition source, not only brings about a intensive growth in data volumes, but also generates great amount of data with heterogeneous, diverse, real-time characteristic. Undoubtedly, marine data become a model for big data. Depth analysis of the data model and intensive study on data storage, data partition and data query strategy has a great impact on solving the urgent issues concerning about marine applications.Marine data features heterogeneity and multiplicity, complexity and other characteristics, while its data storage model is distinct from relational structure, which imposing great pressure on rapid query efficiency and effective data usage, and raising the following question: how to get a structured data model to realize the rational division and transformation on Tabular data without adjusting the low-level vision and details manually.Marine data features massiveness, great similarity and strong spatial correlation, which carrying huge barriers for rapid data query on various marine applications(such as the online polar inspection application and strange inversion application for tide disasters), and bringing the following questions: 1) how to partition the data computing nodes to meet the demands on high computing performance, strong real-time feedback and high frequency query in marine applications; 2) how to design a multi-layer index structure to meet the needs of real-time query response requirements for multi-source data, to accelerate the digital and information process of ocean. To this end, this paper proposes both global division and local partition strategy, which aims to achieve a master-slave index mechanism to improve the query efficiency and data usage.Research on efficient data distributed storage and indexing technology has been significant since ever. Based on above problems, we aim to optimize the storage and query performance for marine data from the following technical route: data modeling, data division and index structure. The main innovation and research contents are described as follows:(1) Marine data storage model and representation is given at first. When dealing with this unstructured data, especially stored in Excel and CSV, a new relation data model based on Tabular Model is proposed and the query optimization problems is described as followed. The Tabular Model adapts the PartiPath tree to preserve the semantic information form and build the relationship pattern. Moreover, a few basic query issues are present, and user interest and fusion index is proposed to improve the query similarity index, which aims to meet the specific query needs.(2) Given the key problem in data storage-- data partition, firstly, this paper obtains the characteristics and internal rules of data by analyzing the collected data according to its spatial autocorrelation and spatial distribution characteristics. The AMDM global division strategy is formulated to realize a reasonable storage distribution among all the computing nodes. Secondly, an adaptive local partitioning strategy, namely, AMSP, is developed based on user behavior, which can transfer the local data in a real-time manner to ensure the consistency and balance among the storage nodes. Based on the comprehensive division strategy of ADMD and AMSP, multi-index structure is designed to improve the utilization of data and access efficiency of data pool.(3) On the basis of the above data division, this paper proposes a master / slave index architecture, including time interval B+-tree index as a global index and L-RR* tree based on AMSP as a local index. The query processing research is mainly divided into two steps: firstly, Receiver accepts the query to find all the correlated nodes through the main index, and then build the query link. In the second step, local index search is performed parallel, and the final information should be returned to the client. In addition, the experiments validates the correctness and validity of the proposed architecture, and the result shows that the index structure can satisfy the real-time query response on massive multi-source data and the demand on high-performance and parallel computing.
Keywords/Search Tags:Big Marine Data, Data Modeling, Data Partitioning, Data Indexing
PDF Full Text Request
Related items