Font Size: a A A

Research On Chinese Entity Relation Extraction For Contract Domain

Posted on:2024-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:T T HeFull Text:PDF
GTID:2568307055975209Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The review of contracts is crucial to business operations.By using Natural Language Processing technologies,the extraction of entities and their relations in contract texts is the focus of research in the contract domain and a fundamental task for intelligent contract review.Since the contract content involves the confidential information of the signer,resulting in less public data,this paper obtains contract data from real business scenarios,and conducts research on entity relation extraction in the contract-oriented domain.The main research contents are as follows:1.Due to the lack of available contract datasets,this paper constructs an entity and relation dataset for the contract domain by manual annotation.The original contract corpus is firstly pre-processed.Then,annotation specifications are formulated based on expert knowledge and The Civil Code of the People’s Republic of China,and annotators are trained to be familiar with the annotation platform and rules.Finally,annotators are organized to perform manual annotation,and reviewers are invited to perform sampling checks regularly to ensure the quality of data annotation.Thus,this paper establishes a contract dataset containing 11 types of entities and 8 types of relations.2.A contract entity recognition and entity relation extraction method incorporating lexical boundary information is proposed.The paper divides the relation extraction into two subtasks,entity recognition and relation classification,based on the pipeline model.In entity recognition,for the problem that character-based methods can hardly make full use of lexical information,static word vectors trained by Word2 vec and dynamic word vectors trained by BERT are stitched together in the embedding layer to supplement the lexical information.Meanwhile,in response to the error of contract entity localization,the Vanilla Transformer structure is improved by encoding the location information in the encoding layer to enhance the recognition effect and achieve the accurate localization of entity boundaries.In the relation classification,this paper introduces the PCNN structure to further improve the classification accuracy.The experimental results show that multiple word vectors splicing and location encoding improvement can improve the recognition ability of the model for entities.PCNN can effectively model the contract context and further utilize the location features.3.A method of entity relation extraction based on a double-layer attention mechanism is proposed.To address the problem that the lack of interaction between two subtasks in the pipeline model leads to miscommunication,this paper uses a double-layer attention mechanism to model the information importance and jointly extract contract entities and relations.The first layer of attention captures global contextual information.The second layer of attention assigns weights to key information that has significant impact on the target words to further extract deep semantic information.Formally,to better capture the temporal information,this paper bridges the double-layer attention mechanism at the top layer of the encoder structure based on Bi LSTM.The experimental results show that the double-layer attention mechanism can effectively capture the potential semantic features and improve the result of relation extraction.The joint extraction model can avoid the loss of key information and further enhance the effect of contract relation extraction.
Keywords/Search Tags:Contract text, Chinese Entity Recognition, Entity Relation Extraction, Double-Layer Attention mechanism
PDF Full Text Request
Related items