Font Size: a A A

Research On The Key Technologies Of Geological Big Data Representation And Association

Posted on:2019-02-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:K MaFull Text:PDF
GTID:1360330596963115Subject:Resources and Environment Remote Sensing
Abstract/Summary:PDF Full Text Request
For a long time,a large amount of data has been accumulated in the field of geological surveys.China has built 48 national geological databases in 10 categories,with a data volume over 700 TB.Geological work has reached a data-intensive mode.Also,geological big data research has received unprecedented attention at home and abroad.Geological survey institutions in the United States and Britain or other countries have recognized the importance of geological big data research and application,and have developed the corresponding action plans.Meanwhile,China has also started the construction of geological cloud platforms.Geological big data is a kind of spatiotemporal big data.Using big data technology to directly mine knowledge in massive geological data,can break through the limitations of traditional data analysis methods with random sampling and narrow sample space,thus promote data-driven geological discovery and intelligent services.Besides,it is possible to obtain new discoveries in geological science by changing the status quo of traditional geological data applications and insufficient collaborative services.The representation and association of big data has been under heated discussion,but the research on the representation and association of geological big data is still insufficient.Geologic objects are in the conditions of “incomplete parameter information,incomplete structural information,incomplete relationship information,and incomplete evolution information”.Therefore,it is especially important to associate information from various sources.And the association firstly needs a reasonable representation of the associated objects.Likewise,the associated geological objects can also reasonably represent the associated structures,attributes,and relationships,so as to perform semantic query,clustering,and other tasks.This paper is focusing on the association between geological spatial entity objects and their external description texts.That is,the association of text data can realize the new paradigm of geological data application of “graphics-text mutual query”.In addition,the extraction of named entities in geological texts has been carried out,but it lacks the research on the extracted entities.Based on the information service application demand of geological big data,this paper applied the representation learning model,and deeply studied the semantic similarity calculation of text data and spatial data in the geological field.Consequently,a prototype system with certain practical functions was constructed to provide a new method for geological data integration and a new paradigm for geological data extraction and application.The main work of this paper includes the following contents:(1)Characteristic analysis and related representation technique carding of geological big data.The composition and related representation technique of the geological big data to be studied were summarized and analyzed,to makes clear the organization and management mode of geological big data.The characteristics and the current representation techniques of geological spatial big data and geological text big data were sorted out.Also,the feasibility of introducing natural language processing techniques to represent geological spatial entities and text objects was discussed.(2)Hierarchical geological spatial entity semantics based on sentence vector combination.Although the geological spatial entity and its related text description were all representation of the same object,there are information asymmetry and inconsistent semantic expression patterns when constructing the association between them.This paper selected the paragraph as the representation granularity of the geological text object,meanwhile defined the concept of rich text space geological entity.And a hierarchical representation of geological spatial entity semantic representation based on sentence vector combination was designed.This representation can fully maintain the topological and attribute features of geological spatial entities and map both to the unified semantic space,so as to solve the problem of inconsistency in the semantic expression of geological space entities and geological texts.(3)A Siamese hierarchical attention network for geological spatial entity and description text matching.Based on the reasonable representation of geological spatial entity and geological text,a Siamese hierarchical attention network(SHAN)was proposed for the matching problems.This network model can avoid feature engineering such as complex named entity recognition and grammatical semantic analysis,hence effectively learn the lowdimensional,real-valued semantic vector representation for two types of objects oriented to associated tasks.In the training process,the model can minimize the distance between matching sample pairs,also maximize the distance between unmatched sample pairs.The experimental results indicated that the SHAN model has better performance(4)Construction and representation of geological entity information network based on ontology mapping.Aiming at the lack of association between geological entities in the geological text information extraction,the geological domain ontology library was designed.The geological named entity annotation and named entity identification and extraction were carried out based on the ontology library.Ultimately,the geological entity information network was constructed.The analysis on the network structure showed that it had the property of Hyper-edge.The star geological entity information network model was defined based on the geological characteristics;and 4 super-edge construction strategies are formulated.The constructed geological entity information network performed representation learning by applying the representation learning model.It defined the firstorder proximity of the nodes in the super-edge,and the more similar to the neighboring node,the closer the second-order proximity of the entity representation it was proved that the entity obtained by learning model can be used for multi-label node classification and node proximity query.(5)Design and implementation of geological big data representation and association prototype system.The geological big data representation and association prototype system was constructed,by designing the system architecture and data processing mode,accessing mode and computing mode.The Multi-type file parser was used to merge and synonymize various types of geological text data.Meanwhile,the storage strategy of massively fragmented small volume files was proposed,integrated with common functions such as word segmentation and vectorization.An efficient geological big data index model was constructed to realize the efficient retrieval of geological big data.Relevant tests such as location-aware services and graph-text related queries were conducted in the platform,and the results were in keeping with the expectations.
Keywords/Search Tags:Geological big data, Geological entity information network, Geological entity, Geological text, Geological ontology
PDF Full Text Request
Related items