Font Size: a A A

Document Relationship Extraction Algorithm Based On Semantic Information Embedding

Posted on:2024-02-03Degree:MasterType:Thesis
Country:ChinaCandidate:H J YinFull Text:PDF
GTID:2568307079459694Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Document relation extraction is a fundamental task in natural language processing,aiming at extracting entity relations from multiple semantically continuous sentences with richer semantic interactions than sentence-level relation extraction.It serves several natural language processing tasks,such as graph construction,dialogue systems and sentiment analysis,by outputting structured triadic knowledge.Most existing document relation extraction techniques start from modelling the relations referring to entity interactions in the document to obtain feature embeddings of contexts.However,when performing cross-sentence entity extraction pairs,especially long-range entity extraction in complex semantic scenarios,it is necessary to filter further the available information in the sentence features in the document and constrain the relational dependencies of entity pairs.Therefore,this thesis researches two aspects: the screening and utilization of sentence information and the semantic constraints between entity pairs.The main research components are as follows.(1)A research on structural information embedding method based on graph neural network is proposed.In order to solve the lack of an explicit basis for critical sentence feature screening in the documents,the algorithm reconstructs the mentioned relationship heterogeneous graph by designing sentences contribution scoring mechanism,screens critical sentence feature embedding as additional semantics,and fuses with entity features to reduce the influence of non-key sentence semantic embedding on entity classification.This thesis compares experiments with mainstream relationship extraction algorithms SSAN,LSR,HIN and other mainstream algorithms.The F1 and lgn F1 values on Doc RED datasets are improved by more than 1.2%.(2)A relationship extraction method based on sampling relationship-dependent information is proposed.To address the problem of false positive examples caused by global classification threshold and category imbalance,the method improves the prediction effect of the model on different categories by introducing a self-attentive mechanism to sample entity pair association matrices and designing an adaptive threshold mechanism to determine the classification boundaries of different categories.Through comparison,experiments with mainstream relationship extraction algorithms such as Altop,SSAN,and GCGCN,the lgn F1 and F1 values on the Doc RED dataset are improved by about 0.7%and 0.5% or more,respectively.(3)Implementing a chapter text structuring prototype system.The system implements the relationship extraction requirements for the chapter corpus by integrating the structural information embedding method based on graph neural network and the relationship extraction method based on the sampling of relationship-dependent information and verifies the usability of the algorithm.
Keywords/Search Tags:Relational extraction, chapter-level, graph neural networks, natural language processing, relational sampling
PDF Full Text Request
Related items