| Relation extraction task aims to extract the semantic relation between two given en-tities from unstructured textual data,and is one of the main source of information for downstream knowledge graph construction tasks.Traditional relation extraction methods require a large amount of high-quality annotated training data,which is often expensive to obtain.In order to obtain annotated data efficiently,researchers have proposed a distant supervised method,which automatically acquires large-scale annotated data by aligning the corpus with the knowledge base.However,because the assumption proposed by this method is too strong,there is often a large amount of noisy data in the labeled data,which has a huge impact on the performance of the model.Therefore,researching a suitable denoising algorithm to reduce the influence of noise on model is of great significance to improve the accuracy of distant supervised relation extraction.In this paper,we conduct an in-depth study on the denoising task in the distant supervised relation extraction task,and the main contributions of this paper are summarized as follows:1.In this paper,we propose a relation extraction model(BiLSTM+EA)that incorpo-rates entity attribute information.The previous relation extraction methods often only consider the information contained in the sentence itself,while the attribute information behind the entities(e.g.,entity descriptions,entity aliases,entity types)is often ignored,which usually contain a lot of effective information to help the model generate richer sentence representations.In this paper,we extract the corre-sponding entity attribute information for each entity from Freebase and Wikipedia.The entity features from these information enable the model to better discriminate noisy sentences.Meanwhile,to reflect the importance of different attributes,we use knowledge graph embedding to assign weights to them.Finally,comparison experiments are conducted on NYT,a widely used distant supervised dataset,and the experimental results outperform other representation-enhancing methods.2.We introduce negative training based on BiLSTM+EA,and propose a relation ex-traction model by negative training,called EANT.Unlike positive training,negative training is based on the idea of ”this sentence does not express the target relation”,and selects a complementary label to train the model.We randomly select a relation label other than the original label of the sentence to train the model away from this complementary label.By training in this way,the model is able to widen the confi-dence gap between clean and noisy data,and thus achieve noise filtering.Compared with other benchmark models on the NYT dataset,which is a widely used distant supervised dataset,the EANT model does reduce noisy data and can improve the performance of relation extraction,with experimental results significantly better than other mainstream models.3.Based on the above models,we also propose a distant supervised denoising frame-work based on EANT.The framework consists of two modules: a noise filter mod-ule and a noise cleaning module.The noise filter module filters out possible noisy data from the dataset by the EANT model,and the noise cleaning module arti-ficially re-labels some of the filtered noisy data.The framework combines the coarse-grained EANT model for discriminating noisy data with a fine-grained man-ual annotation strategy to improve the relation extraction performance of the model by introducing human common-sense knowledge to correct the noisy data in the distant supervised dataset and finally obtain relatively reliable denoised data.The framework is demonstrated to have effective denoising capability on the manually annotated noisy dataset. |