Font Size: a A A

Research On Key Technologies For Constructing Sediment Knowledge Graph Based On Natural Language Processin

Posted on:2024-08-25Degree:MasterType:Thesis
Country:ChinaCandidate:Z C HuFull Text:PDF
GTID:2530307106982079Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Sediment data reveal the evolution of life on Earth,with the critical information(sedimentation time,sediment type,geospatial information,etc.)usually recorded in scientific and technical literature.Natural language processing techniques(e.g.named entity recognition,relationship extraction,etc.)are useful for promoting sediment knowledge sharing by mining the key sediment information in the literature and extracting the association relationships among them to construct a sediment knowledge graph.However,sedimentological literature presents multi-source heterogeneous features,which makes the extraction of sediment entity data huge volume,high time loss,high coupling of semantic relationships between literature contexts,and the relationship between entities not accurately discriminated.Therefore,the sediment knowledge graph construction faces the following challenges:(1)The sediment corpus is missing,and there are lexical ambiguities among entities,so that an effective named entity recognition model is impossible to be constructed.(2)The sediment relationship extraction lacks a priori knowledge of the knowledge system,weak characterization of hidden relationships,which cannot be extracted accurately by the model.In view of this,this paper carried out research work on the key technology of sediment knowledge graphs based on natural language processing,the research content mainly includes:(1)In order to solve the problems of missing corpus and entity lexical ambiguity in sediment domain,this paper proposed a name entity recognition model based on lexical regularity analysis with Bidirectional Long Short Term Neural Network and Conditional Random Fields(Bi LSTM-CRF)model.Specifically,firstly,an expert lexicon and lexical regularity matching formula are designed based on sedimentation phrase features,which proposed a sedimentology-based utterance word-cutting algorithm.Secondly,the formed a priori knowledge of sedimentology domain is combined with multi-source heterogeneous literature for the extraction of phrases and lexicalities to construct a structured sediment corpus.Finally,in order to disambiguate the entity lexical ambiguity problem,lexicality is combined with Bi LSTM-CRF,thus ensuring the stability of entity recognition.The experimental results show that the model achieves to reduce the delay of entity recognition while the corpus is scalable and optimizes the discrimination of unregistered words.(2)To address the problem that traditional relationship extraction models are weak in characterizing contextual relationships and difficult to use prior knowledge,which leads to poor interference resistance in cross-utterance relationship extraction,this paper proposed a clustering relationship extraction model based on Bidirectional Gated Recurrent Unit Neural Network-Conditional Random Fields-Attention(Bi GRU-CRF-Att)mechanism.Specifically,first,the sediment named entity recognition model is used as an upstream task to obtain the contextual statements containing sediment entities.Further,inter-entity relationships are mined based on Bi GRU-CRF-Att.Finally,the relationship attributes between sediments are extended by clustering algorithm.The experimental results show that the model improves the robustness to redundant noise between contextual statements while extracting hidden relationships.
Keywords/Search Tags:Sedimentological knowledge graph, Natural language processing, Named entity recognition, Relationship extraction
PDF Full Text Request
Related items