Font Size: a A A

Research On Two-stage GCN Entity Relation Extraction Based On Span Representation

Posted on:2023-09-05Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y SunFull Text:PDF
GTID:2568307064970389Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Entity relation extraction is a hot task in natural language processing.Its purpose is to mine entities and relationships from unstructured text data.It is generally divided into two sub-tasks: entity recognition and relationship extraction.Recent studies have focused more on the joint extraction of two sub-tasks through complex networks,which increases the complexity of the network and the time overhead of the model.Compared with the joint extraction method,the pipeline method is simple and efficient and reduces the noise information,but the method has the problem of error propagation.Therefore,this paper proposes a representation method based on span and span pairs to reduce the error between the two tasks,making the model more suitable for entity relation extraction tasks.The specific contents are as follows :(1)To solve the problem of entity nesting,a method based on span representation is proposed.Entities are transformed into entity spans,all potential entity spans are obtained by the enumeration method,and all entity spans in nested entities are selected for span classification,to improve the accuracy of entity recognition.Firstly,the token representation is obtained by GloVe and SpanBERT,and the initial span set is obtained by enumeration.Then,the boundary of each span is enhanced by the method of boundary enhancement to improve the accuracy of entity span boundary recognition.Secondly,the span filter is used to filter and adjust the span under the constraint of the pre-specified span length set,and finally,the span classification is realized.The span boundary detection strengthens the range of entity span,which is helpful for the span filter to filter out low-quality spans,to realize the real recognition task.The experimental results are sufficient to illustrate the effectiveness of the span-based boundary enhancement model for entity recognition tasks.(2)To solve the long-distance reasoning problem of overlapping relationships,the interaction between span pairs is fully considered based on entity recognition,and a two-stage GCN relationship extraction model based on span pairs is proposed to predict overlapping relationships.The information interaction ability between span pairs is improved,and the long-distance learning ability is further improved to realize the relationship classification task.Firstly,the obtained entity spans are inserted into the corpus by floating marks and encoded by the Bi-LSTM algorithm to obtain the vector representation of each span pair.Then,the relationship reasoning is realized by two-stage GCN.In the first stage,GCN obtains the initial score of the relationship between span pairs.In the second stage,long-distance feature learning is performed to further screen out overlapping relationships,and finally,all relationships between entities are obtained.The experimental results show that in the comparative experiment,the two-stage GCN relation extraction model based on span pairs improves the F1 value on multiple data sets,which is enough to illustrate the effectiveness of the model for relation extraction tasks.The experimental results show that the two-stage GCN entity relation extraction model based on span representation is superior to the current model in both the entity recognition task and relation extraction task,and reduces the error between the two sub-tasks.Figure [17] Table [13] Reference [71]...
Keywords/Search Tags:Entity relation extraction, entity identification, relation extraction, span representation, two-stage GCN
PDF Full Text Request
Related items