| In recent years,with the rapid development of information technology,meteorological and ocean observation data,product data and Internet data are all growing explosively,and they have gradually become big data with the “5V” properties of Variety,Volume,Velocity,Veracity,and Value,which poses great challenges to the efficient data management and utilization,including data finding and access,data preprocessing,and data application.It is of great significance to study the key technologies such as efficient meteorological and ocean data finding and access,preprocessing,and efficient calculation in weather prediction and ocean forecasting for meeting the timeliness and refinement requirements in real data applications.Moreover,as the abstraction of data features,the data model describes the data structure,data operation and data constraints from the abstract level,which plays an important role in supporting the implementation and optimization of various algorithms in practical applications.Particularly,optimizing application research by innovating data model has become an important topic in the big data area.From the perspective of the above data management and utilization process,this dissertation focuses on the research of key technologies for meteorological and ocean application optimization by optimizing different data models in three aspects,i.e.,data service discovery,dimensionality reduction of high-dimensional data,and data parallel computing.In the data finding and access stage,to address the problems of low access rate and low efficiency of existing data service discovery methods,this dissertation optimizes the data organization and index model,and proposes an ontology semantic and quick index based ocean data service discovery method named OSID.The OSID method mainly includes two stages,i.e.,the construction of the semantic-based ocean data ontology model ODOM,and the semantic pre-reasoning optimization based on the extending of the quick service query list QSQL.In the ODOM model construction stage,the metadata of relevant elements in the process of ocean data acquisition is used as the concept to construct the ontology model,and the relationship between various concepts is described through hierarchical classification,data type properties and object properties,so as to describe ocean data in a unified manner.In the semantic pre-reasoning optimization stage,the semantic relationship of the model is pre-inferred first.On the one hand,the semantic dictionary Word Net is used to obtain the synonyms,hypernyms and hyponyms of concepts in the ODOM model,so as to extend the semantic relationship of the model from the perspective of hierarchy.On the other hand,based on RDF implication rules and OWL semantics,the semantic relationship of the model is extended from a horizontal perspective by using the object properties of the model.Based on the ODOM model,we extend the basic QSQL data structure and publish the extended semantic relationship to the extended QSQL,so as to complete the semantic reasoning process in advance in the publishing stage,and reduce the response time of semantic calculation in the service query process.The experimental results show that compared with the existing service discovery methods,the OSID method can not only significantly reduce the response time of service discovery,but also support the query for more complex and semantic-related services,meaning that it can obtain higher recall rate.In the data preprocessing stage,to solve the problems of weak generality and low efficiency of existing dimensionality reduction methods of high-dimensional data,this dissertation optimizes the data distributed representation model,and proposes a neighborhood exploring and word-vector model based fast dimensionality reduction method named FVec2 vec.It is a non-linear data dimensionality reduction method in nature,which aims to maintain the data neighborhood structure.It mainly includes three stages,i.e.,the neighborhood calculation stage based on neighbor exploration,the context generation stage based on random sampling,and the low-dimensional embedded representation calculation stage based on word-vector model.In the first stage,the approximate neighborhood structure of data points in high-dimensional space is constructed by exploring the similarities between data points and their neighbors’ neighbors,so as to avoid the pairwise similarities calculation between data points,and obtain as high accuracy as possible while improving the calculation efficiency.In the second stage,the data points obtained by neighbor exploration is randomly sampled to quickly generate the context sequence of data points,so as to further improve the computational efficiency.In the third stage,the efficient word-vector model Skip-gram with only one hidden layer is extended to learn the embedding representation of the high-dimensional numerical matrix in the lowdimensional space,so as to achieve dimensionality reduction.The experimental results show that FVec2 vec not only can deal with the general numerical matrix,but also is more efficient than the same type method Vec2 vec.Moreover,the accuracy of FVec2 vec is even higher than that of the state-of-the-art UMAP method with certain similarity metrics.In the real data application,in response to the fact that the scalability of the current atmospheric circulation spectral model is limited and it is difficult to improve the efficiency of computing,this dissertation optimizes the data decomposition model,and proposes a highly scalable data parallel computing method named PAGCM with two-dimensional decomposition.The advantages of the PAGCM method mainly lie in the following three aspects.Firstly,it stems from the analysis of data-dependent by considering the calculation characteristics in each calculation stage of the dynamic framework.Secondly,it derives from the proposed two-dimensional decomposition data model based on datadependent.Thirdly,it lies in the parallel optimization of each calculation stage.In the data-dependent analysis,the data-dependent of six main calculation stages of the dynamic framework is analyzed with the consideration of the computational characteristic of each stage,including the grid-point space calculation,Fourier transform,Legendre transform,spectral space calculation,inverse Legendre transform,and inverse Fourier transform.The analysis provides a basis for two-dimensional data decomposition,so that the data decomposition is performed on the dimensions without data dependencies,and additional data communication overhead between processors can be avoided.In the construction of two-dimensional decomposition data model,according to the data-dependent of each calculation stage,two kinds of data decomposition algorithms are proposed,i.e.,the sequential decomposition algorithm and the round-robin decomposition algorithm,which are both designed to distribute data subsets to each processor as evenly as possible,so as to ensure the optimization of parallel scalability and reach load balancing.In the parallel optimization of each calculation stage,the proposed three-dimensional data transposition algorithm is used to optimize the communication of the calculation process,while the proposed global data collection algorithm is used to maintain the energy conservation of the physical process,which ensures the correctness of the simulation results.The experimental results show that the PAGCM method can not only ensure the correctness of the model simulation results,but also effectively improve the parallel scalability and efficiency of the model. |