Font Size: a A A

Research Of Key Technologies About Geoscience Data Integration And Application Based On Data Warehouse & SOA

Posted on:2009-06-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y Z WangFull Text:PDF
GTID:1100360245463410Subject:Earth Exploration and Information Technology
Abstract/Summary:PDF Full Text Request
There are more than 100 kinds of national basic geological databases in China now, and capacity of data exceeds 100TB.Those databases show "Information island" phenomenon and some disadvantages:multi~source,heter-ogeneous,disperse,tightly-coupling, different semantics and reuse poorly. If geoscience data can be integrated together, then they can reserve or increase their value and service for the prospecting, mineral prospecting, resource forecast and sustainable development better.This paper researches deeply and discusses widely in data modeling, data warehouse, application integration and share, and make some spatial analysis and data mining on the integrated data in geoscience field. It is first time to build systematically such a whole framework that about geoscience data from integration, application and analysis to one-stop service.In order to meet the requirements of geoscience data integration and application integration, an architecture is designed to integrate and share geoscience data and application that includes data source, spatial data transformation, geoscience spatial data storage, spatial application service and one-stop client application. The geoscience spatial data integration flow chart displays tow level integration, one level is to implement the same structure data integration by using data warehouse, another level is to implement the heterogeneous data integration by using SOA that includes the same structure data integration and application integration. For ensuring long-term, complex and systematic geoscience data integration smoothly, a bottom-to-up execution pyramid pattern gives a good overall planning and milestone step-by-step method. A storage and access solution that combines centralization and distribution is used to manage geoscience data and geoscience metadata based on common warehouse metadata model and then give the physical deployment architecture for them.This paper designs the geoscience spatial data warehouse (GSDW) model based on grid storage to storage the massive geoscience data uniformly. GSDW is composed of national geoscience primitive data warehouse, national geoscience spatial data warehouse, georaster data warehouse and content data warehouse. The data in the GSDW is organized by specific. The subset of GSDW can generate many data marts, such as geological map. A content data warehouse can organize all content data in GSDW. Full-text search can be used to select and use content service interface to construct relationship with other spatial data and property data. Because the capacity of the raster is very big, so it will consume a great deal of disk space and it maybe affect the speed of storage and select. The new generation compression technology of optimized wavelet algorithm can compress the raster data better than before. Users can't feel the delay generated by compression and decompression. Colored raster can be compressed to 3 percent of origin data, but black and white raster can save 90 percent storage space, but no loss in vision.The rule database has enough geoscience data extract rules. The rules can limit the geoscience data those should enter the data warehouse flexible. For guaranteeing the quality of geoscience data entering data warehouse, the data have to be accepted data cleanse strictly. The rules of data cleanse can filter and modify the origin data with problems, and then ensure the quality and correctness of the geoscience decision in the future.Modeling with the object-oriented feature of Oracle database, and vector spatial data is stored in the field whose type is SDO_GEOMETRY, raster spatial data is stored in the field whose type is SDO_GEORASTER, property data is stored in the field whose type is regular object type(such as AddressType). This paper lists a lot of mapping principle diagram that mapping from source data model to destination data model, such as one to one, one to many, many to many, and so on. They offer the templates transforming from heterogeneous to the same structure. Using identification data of the placer sample to authenticate the performance of batch loading much better than a single record loading one time, the result displays an average of 30 times faster. In order to meet the requirement of the complex geoscience data mapping, transformation and load, this paper designs the GeoSpatialETL model based on UML, and then the ETL tool is implemented by object-oriented programming language based on MVC (Mode-View-Controller) model. After analysis of the features of GSDW, parallel development pattern of GSDW with feedback as the effective pattern is used to build GSDW.On the basis of analyzing spatial data cube principle, this paper builds the geoscience spatial data cube that with spatial dimension and spatial measure, and then users can observe and analyze the geoscience data in multi-angle. Using oil mine resource data to perform the Geoscience OLAP analysis operation, such as roll-up, drill-down, and so on.The geoscience application integration framework based on SOA is built to meet the requirement of integrating existing components and new components after systematic introduction of the framework of SOA, SOA has some important components such as Web Service Manager, Enterprise Service Bus, Business Process Execution Language, Business Activity Monitoring, etc.The geoscience data manipulation platform is built based on SOA and using XML Web Services as main technology means, it is language independent, cross-platform, cross-database and loosely coupled. This paper does not only give the component model of geoscience service, but also give the following patterns: the mapping process of component mapping to service, the property data integration pattern based on SOA, the spatial data integration model based on GML & SOA, the integration pattern based on same structure GIS data, and the integration pattern based on heterogeneous GIS data, the access geoscience spatial data pattern based on Web Feature Service, the sharing pattern based on Catalog Service for the Web, the comprehensive geoscience data service based on SOA(the spatial, property, raster and content work in collaboration). Otherwise, a full-database search component is implemented to search some key words in the whole database and give the most detailed and most correct result.The return data that the result of call Web Service is XML format, its capacity is huge. It can improve select speed and reduce the pressure of server if the result in XML format can be optimized. This paper gives tow optimized new share pattern. The first is to refactor the find result of Web Service. It includes only the data structure information once in the head of the XML file and the following is the real data without XML data structure. The refactoring data is compressed before sent out from the server side, decompressed and restructured on the client side before display in the window. The test result shows that the capacity of data can be reduced to the original XML file one of the 120 points. The second new model of service share is to use socket server. In this model multi-thread, queue, file buffer pool and compression technologies can improve greatly the performance of the massive data access.After the integration of geological data analysis and data mining can find the valuable pattern or law hide the massive geoscience data. On the basic principles of spatial analysis carried out on basic, and are given the basic algorithm of buffer analysis and overlap analysis. Application of the traditional factor analysis and cluster analysis of the Dexing Copper Mine River collected samples were analyzed, and the environmental situation is the objective of the comprehensive evaluation. The initial evaporation of discrimination rock model is constructed in this paper, set up a series of evaporation rock of discrimination rules. The evaporation rock lithology discrimination is performed based on the knowledge, found that the evaporation sedimentary rock rhythm cycle model, and apply it in lithology discrimination process and effective solution to the problem of multiple solutions lithology, the result was more satisfactory than before.The initial realization of the data integration and application integration, data includes the potential of the mineral resources, mine deposit, the basic geography, the geological maps, and so on. The object-oriented type of Oracle is used to establish the core data model of the data warehouse. Tow different SOA servers are called by all kinds of clients to achieve the same structure and heterogeneous data sharing and application sharing. The structure of the geological integrated system based on MVC model is constructed and an integrated platform is established. At last it shows some diagrams, they are national oil resource overlap analysis result, national coal resource buffer analysis, the seamless integrated search result of potential database and mine database, the integrated analysis result of supply analysis system, the integrated result of national oil and gas database, the integrated result of evaporation rock of discrimination system, etc. Combination of geological background, oil and coal resource potential summed up its distribution, and the usage and exploitation of the availability of an analysis and forecast in the future.Through the "golden soil project" project in the realization and application, the result shows that the research results of this paper achieved the expected goals of geological data integration and application integration. It has a great practical significance and provides a valuable reference in the geological field for the data integration and application integration in the future.
Keywords/Search Tags:geoscience data integration and share, data warehouse, Service-Oriented Architecture, spatial analysis, data mining
PDF Full Text Request
Related items