| Relation extraction is an important part in the process of building knowledge graphs,which has significant research significance and broad application prospects.As one of the fundamental tasks in the field of natural language processing,relation extraction aims to identify semantic relations between entities in unstructured text.Unlike supervised relation extraction which requires high-quality manually annotated data,distantly supervised uses heuristic rules to annotate data at large scale,label noise and long-tailed relations are two major challenges in the distantly supervised relation extraction task.To address these two challenges,this paper improves the existing noise reduction model and long-tail relation model,respectively,and builds a Web-based entity relation extraction application system.The main research contents of this paper are as follows:(1)The automatic annotation process of distantly supervised the alignment of relation instances of a knowledge base of unstructured text inevitably introduces noisy data,and utilizing the large amount of external information related to entities in the knowledge base is currently an effective means of noise reduction.The RESIDE model that uses entity type and relation alias information to impose soft constraints in predicting relations is investigated,and the BERT pre-training model are used to optimize the sentence encoding module consisting of a bidirectional gated recurrent units network and a graph convolutional network to improve the model’s ability to capture semantic associations and dependencies between long sequences;The self-attention mechanism is used to combine contextual information to dynamically adjust the word vector to solve the problem of word polysemy caused by GloVe(Global Vectors for Word Representation)word embedding.Experiments are conducted on two public distantly supervised datasets,and the noise reduction metric P@N is improved by on 5%.The experimental results show that fusing external information based on the BERT pre-trained model can bring significant performance improvement to the RESIDE model.(2)Recent research has made great progress of noise reduction with the help of a multiexample learning framework,but even introducing a hierarchy of relations to share knowledge has not effectively solved the long-tail relation problem.In this paper,the constraint graph structure is introduced to model the dependency between relation labels,and a long-tail relation extraction framework(LTRE)is designed on this basis.Using the neighbor aggregation mechanism and attention mechanism in graph attention networks,information is passed from data-rich head relation nodes to data-poor tail relation nodes according to different weights;The pre-training model De BERTa(Decoding-enhanced BERT with disentangled attention)is used to encode sentence information in combination with piecewise convolutional neural networks,while incorporating entity type information.Experiments on the publicly available distantly supervised dataset NYT-10 showed an average improvement of about 0.5% and 2.75%in the P@N and Hits@K indicators,and the experimental results show that these improved strategies effectively enhance the anti-interference ability and long-tail relation extraction ability of the model.(3)Based on the study of relation extraction model,this paper develops a Web-based relation extraction application system.The system adopts the technology development of frontend and back-end separation and modular design to complete the construction of the Web side,and finally encapsulates it into the form of an interface for the service side to call.The system is mainly responsible for the implementation of data management,relation extraction,user permission control and visual knowledge graph,etc.In practice,the system can output the entity-pair relation in text in a faster and more intuitive way,and save the operation results from the algorithm models on the form of knowledge graph in the database. |