Font Size: a A A

Research On The Key Methods For Cost-sensitive Distant Supervision Relation Extraction

Posted on:2019-07-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y XiaoFull Text:PDF
GTID:2428330602960560Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the explosive growth of internet information,how to help people quickly discover and understand knowledge from massive unstructured text data,and express these texts in a form that computers can "understand" to reduce people's learning costs.It has become more and more important.Information extraction technology is produced under such circumstances.Relation extraction technology is a very important part of information extraction technology.It aims to automatically identify the semantic relation between a pair of entities which from structured or unstructured text,and then the extracted information is saved in the form of triples.Currently,the supervised relation extraction methods have proven to be effective and produced relatively good results.However,supervised methods often require a large amount of labeled training data,and manual labeling of data is time consuming and labor intensive.In order to solve the shortcomings of insufficient training data,the distant supervision method is proposed to automatically generate the training data.The assumption of distant supervision is that if two entities have a relation in a known knowledge base,then all sentences referring to the two entities will express such relation in some way.That is to say,distant supervision can iteratively expand the relation set from an alternative relation,and discover more relation from the network to join the candidate relation.However,since the real-world data sets are often unbalanced,there is a significant class imbalance problem in the data that is automatically annotated by distant supervision.Classifiers trained under unbalanced data sets have a stronger bias toward classes with a larger number of samples,resulting in a small number of classes that cannot be correctly classified,making the effect of the model even worse.This paper mainly studies the problem of unbalanced class extraction based on convolutional neural network,The specific work is as follows:(1)Firstly,it introduces the relevant meaning background and theoretical and technical basis of entity relation extraction,such as the related theory of convolutional neural network and multi-instance learning.Then,some methods for dealing with imbalance data sets are compared and analyzed.(2)Secondly,this paper proposes a cost-sensitive ranking loss for the class imbalance problem generated in the process of distant supervision relation extraction,and increases the punishment for the misclassification of a minority class of errors,that is,if a instance has a small number of samples,When the instance is misclassified,it pays more cost than the misclassification of majority class,thus reducing the bias of the majority in the training process,so as to improve the accuracy and recall rate of the relation extraction.(3)Finally,since the performance of learning under imbalance data sets largely depends on the separability between classes,this paper introduces a measure of class separability.By combining class separability and cost sensitivity,the problem of class imbalance in distant supervision relation extraction is further improved.
Keywords/Search Tags:Distant supervision, relation extraction, cost sensitive, class separability, convolutional neural network
PDF Full Text Request
Related items