Font Size: a A A

Towards Entity Relation Extraction On Long-tailed Data Distribution

Posted on:2022-06-18Degree:MasterType:Thesis
Country:ChinaCandidate:H Y YuFull Text:PDF
GTID:2518306551453624Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of related technologies such as machine learning and deep learning,artificial intelligence has played a convenient role in our daily lives and work.How-ever,the further understanding of the human world by intelligent machines is inseparable from understanding knowledge.The knowledge graph stores knowledge in a structured form and describes abstract concepts,named entities,and relationships in the objective world.The qual-ity of knowledge graph construction depends on the performance of entity relation extraction.The current entity relation extraction model based on deep learning relies on a large amount of labeled data,and the proportion of rare categories in real application landing scenarios is large,and the characteristics of long-tailed data distribution cannot be underestimated.In the long-tailed data distribution scenario,using more labeled data of common head categories to solve poor extraction performance in the tail rare categories are urgent scientific research problems.At present,according to the long-tailed distribution characteristics of the entity relation extraction data set,there are researches on the unbalanced scenes of the overall distribution and the extraction research on the few-samples scene of the long tail.This thesis proposes algorithm models to improve the entity relation extraction problem in the data long-tailed distribution scenario based on these two research perspectives.In the case of unbalanced overall distribution categories,the extraction problem is viewed from decoupling the existing models so that the key to improving the tail category extraction performance is placed on the classification layer learning.In the scene with few tail data samples,we hope to make full use of the interactive information of entities and relations and bridge text information and knowledge representation information by building multi-type prototypes to improve the extraction performance in the few-sample scenes.Specifically,the research content of this thesis includes the following two aspects:1.Aiming at the research scenario of class imbalance caused by the long-tailed distribution of comprehensive data,this thesis proposes to decouple the existing entity relation extraction model based on deep learning.When the model is decoupled into a representation layer for ex-tracting textual semantic information and a classification layer for specific category paradigms,the observation probe decoupling experiment found that the commonly used natural sampling methods learn more representational capabilities than data resampling,loss function weighting,and other category rebalancing processing techniques.A classification layer parameter learning algorithm based on the relational attention routing mechanism is further proposed,combined with the advantages of equal initialization of the relation capsule layer and multiple routing iter-ations across the capsule layer to improve the overall data extraction performance.Experiments are performed on the commonly used and artificially constructed long-tailed entity relationship extraction data sets,and the effectiveness of the proposed method is established.The extrac-tion ability of uncommon tail categories is improved without reducing the performance of head category extraction.2.Aiming at the long-tailed few-shot data research scenario,in order to make full use of the implicit interaction between the entity pairs and relations in the knowledge triples,com-bined with the translation algorithm in the knowledge representation learning,a multi-prototype embedding network model is proposed to solve joint entity relation extraction in the few-shot learning scenario.Specifically,the model designs a hybrid prototype learning mechanism to bridge the text and knowledge of entity pairs and relationships so that the model injects implicit associations between entities and relations during the learning process.Besides,in order to en-hance the efficiency of model learning prototypes,at the same time,a regularization constraint of prototype perception is introduced,which makes the learning of prototypes of the same type more concentrated,and the spatial distance between prototypes of different types is further en-larged.
Keywords/Search Tags:Entity Relation Extraction, Deep Learning, Long-tailed Distribution, Class Imbalance, Few-Shot Learning
PDF Full Text Request
Related items