Research On Key Technologies For Constructing Sediment Knowledge Graph Based On Natural Language Processin

Posted on:2024-08-25

Degree:Master

Type:Thesis

Country:China

Candidate:Z C Hu

Full Text:PDF

GTID:2530307106982079

Subject:Electronic information

Abstract/Summary:

PDF Full Text Request

Sediment data reveal the evolution of life on Earth,with the critical information(sedimentation time,sediment type,geospatial information,etc.)usually recorded in scientific and technical literature.Natural language processing techniques(e.g.named entity recognition,relationship extraction,etc.)are useful for promoting sediment knowledge sharing by mining the key sediment information in the literature and extracting the association relationships among them to construct a sediment knowledge graph.However,sedimentological literature presents multi-source heterogeneous features,which makes the extraction of sediment entity data huge volume,high time loss,high coupling of semantic relationships between literature contexts,and the relationship between entities not accurately discriminated.Therefore,the sediment knowledge graph construction faces the following challenges:(1)The sediment corpus is missing,and there are lexical ambiguities among entities,so that an effective named entity recognition model is impossible to be constructed.(2)The sediment relationship extraction lacks a priori knowledge of the knowledge system,weak characterization of hidden relationships,which cannot be extracted accurately by the model.In view of this,this paper carried out research work on the key technology of sediment knowledge graphs based on natural language processing,the research content mainly includes:(1)In order to solve the problems of missing corpus and entity lexical ambiguity in sediment domain,this paper proposed a name entity recognition model based on lexical regularity analysis with Bidirectional Long Short Term Neural Network and Conditional Random Fields(Bi LSTM-CRF)model.Specifically,firstly,an expert lexicon and lexical regularity matching formula are designed based on sedimentation phrase features,which proposed a sedimentology-based utterance word-cutting algorithm.Secondly,the formed a priori knowledge of sedimentology domain is combined with multi-source heterogeneous literature for the extraction of phrases and lexicalities to construct a structured sediment corpus.Finally,in order to disambiguate the entity lexical ambiguity problem,lexicality is combined with Bi LSTM-CRF,thus ensuring the stability of entity recognition.The experimental results show that the model achieves to reduce the delay of entity recognition while the corpus is scalable and optimizes the discrimination of unregistered words.(2)To address the problem that traditional relationship extraction models are weak in characterizing contextual relationships and difficult to use prior knowledge,which leads to poor interference resistance in cross-utterance relationship extraction,this paper proposed a clustering relationship extraction model based on Bidirectional Gated Recurrent Unit Neural Network-Conditional Random Fields-Attention(Bi GRU-CRF-Att)mechanism.Specifically,first,the sediment named entity recognition model is used as an upstream task to obtain the contextual statements containing sediment entities.Further,inter-entity relationships are mined based on Bi GRU-CRF-Att.Finally,the relationship attributes between sediments are extended by clustering algorithm.The experimental results show that the model improves the robustness to redundant noise between contextual statements while extracting hidden relationships.

Keywords/Search Tags:

Sedimentological knowledge graph, Natural language processing, Named entity recognition, Relationship extraction

PDF Full Text Request

Related items

1	Named Entity Recognition And Relationship Extraction Based On Biomedical Domain Knowledge Enhancement
2	Research On The Application Of Deep Learning Models In Geographic Named Entity Recognition
3	Research On Oil And Gas Knowledge Graph Platform Based On Deep Learning
4	Research On Entity Linking And Prediction Methods For Marine Economic Industry
5	Research Of Knowledge Extraction Method For Biomedical Literature Based On Graph Neural Network And Its Application
6	Research On Virus Named Entity Recognition Methods Based On Language And Distantly Supervised Model
7	Named Entity Identification For The Construction Of A Knowledge Graph In The Field Of Endangered Wild Mammals In China
8	Research On Extraction Method And Application Of Geological Entity Relationships
9	Research And Application Of Biomedical Named Entity Recognition Based On Reinforcement Learning
10	Drug-Disease Treatment Relationship Discovery Based On Named Entity Recognition And Network Link Prediction