| We are in a highly informational age,in which text is one of the most important types of information.How to automatically extract valid information from massive unstructured text data is a hot spot in the field of natural language processing.Relation Extraction Technology as an Effective Means of Information Extraction can extract relation information between entity pairs from sentences,paragraphs,and documents to support downstream tasks such as Question Answering System,Text Retrieval and Knowledge Graph.Traditional relation extraction techniques rely on supervised data,which can only extract information in a small range.There exist many limitations when it faces semantically rich and varied Internet texts.In order to get rid of the high cost of manual annotation and the rare training text,distantly supervised relation extraction appeared.It does not need manual annotation and easily accesses training data,and can use the distantly labels as training sample labels by entity alignment.There are still three problems in the research.Firstly,because of the strong assumptions,datasets exist lots of noisy data.Secondly,to calculate semantic similarity between sentence relational semantic and remote knowledge base semantics.Thirdly,how to deal with the problem of multiple relation labels corresponding to the same entity pair.This paper adopts a method based on text similarity and attention mechanism to reduce noisy data.Firstly,the text-similarity of sample relation and knowledge base relation is calculated,and the data with a high probability of noise is filtered out.The remaining samples are denoised based on a multi-instance-level attention mechanism.Simultaneously,a multi-entity attention mechanism is used in knowledge bases to update relation vector representations to improve knowledge representation.In addition,this paper adopts a multi-label training method.An entity pair can correspond to multiple relation labels,which improved the limitations of one-to-one relation.This paper uses part of the data in NYT10 to train the model that is better than the classic baseline model in F1 value and PR curve,and it improved performance on distantly supervised entity relation extraction tasks in some aspects. |