Font Size: a A A

Research On Document-level Entity Recognition And Relation Extraction Method

Posted on:2024-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:C X GuanFull Text:PDF
GTID:2568307115963849Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,data on the Internet has increased explosively.Lots of information is distributed implicitly in the data,which increases the difficulty to acquire information.So how to extract valuable information from these data has become a research hotspot.Information extraction is one of main ways to obtain information by parsing text.Named Entity Recognition(NER)and Relation Extraction(RE)are subtasks of information extraction.At present related studies on both focus on sentences.However,in the field of medicine,financial,etc.original corpus is usually from documents.Different from sentences,documents usually provide more abundant information about entity descriptions and interactions between entities.Thus this thesis conducts research on the following challenges for document-level NER and documentlevel RE.(1)Label inconsistency problem: There are usually different forms of mentions for the same entity in a document.If entities are recognized only from sentences,different mentions for the same entity may be recognized as different entity types;(2)Problem of variable forms of entity mentions and insufficient reasoning ability in the cross-sentence entity relation extraction scene: On the one hand,relations between entity pairs in a document may cross multiple sentences and be inferred by multiple sentences.On the other hand,logical relations may exist between different relational facts.So logical inference abilities are required.Existing sentence-level models have difficulty addressing these challenges.(3)At present,there is a lack of annotated document-level NER and document-level RE datasets.By manually extracting information and summarizing,this method consumes time and effort.To address these challenges,the main research of thesis includes:(1)Aiming at the problem of label inconsistency,a document-level NER model based on double graph(DNER-DG)is proposed.Firstly,DNER-DG constructs word-level graphs and sentence-level graphs according to different levels of documents and two node updating strategies are used to update nodes of word-level and sentence-level graphs.Then word representations that fuse document information are obtained by using word nodes and sentence nodes.Finally,word representations are fed into Bi LSTM and CRF to predict label sequence.Experimental results show that compared to other baseline models,DNER-DG improves F1 values on the Co NLL-2003 and Doc RED datasets,which validated the effectiveness of the model.(2)Aiming at the problem of variable forms of entity mentions and insufficient reasoning ability in the cross-sentence entity relation extraction scene,a document-level entity RE model based on relational graph convolutional networks(DERE-R-GCNs)is proposed.DERE-RGCNs includes the encoding layer,the constructing layer,the inference layer and the classification layer.The encoding layer encodes documents by Bi LSTM.The constructing layer constructs heterogeneous graphs based on predefined node types and edge types and R-GCNs is used to update nodes.The inference layer completes inference on edges by an iterative algorithm,which generates edge representations between entity pairs.The classification layer predicts relations between entity pairs by using edge representations.Experimental results show that compared with other baseline models,DERE-R-GCNs obtains significant improvement for Inter-F1 values on CDR and GDA datasets,which can effectively extract valuable document information and accurately extract relations between cross-sentence entity.(3)Aiming at the lack of document-level NER and RE datasets in a specific domain,an entity relation annotation system is constructed.The system is used to annotate entity and relation between entities.An annotation file is exported in a fixed format.In this thesis,the system is applied to tourism domain to annotate tourist attraction profiles.Later the annotation file is imported into Neo4 j graph database.The thesis provides keyword query function in the frontend.The implementation of the system facilitates annotation of entities and relations for various domain datasets.The system can be used for the study of NER and RE or the construction of small scale knowledge graphs.
Keywords/Search Tags:Document-level Named Entity Recognition, Document-level Entity Relation Extraction, Relational Graph Convolutional Networks, Iterative Algorithm
PDF Full Text Request
Related items