Font Size: a A A

Patent Causality Extraction Based On Bidirectional LSTM

Posted on:2021-08-06Degree:MasterType:Thesis
Country:ChinaCandidate:Z YangFull Text:PDF
GTID:2518306560953169Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Causal relationship reflects the things has successively by the development of the cause and effect relationship between causal relationship between patent text embody the core patent technology,is helpful to precise mining patent information,also help patent although causality extraction of knowledge map building is a classic research direction in the field of natural language processing,but the study of patent corpus aspect less sampling methods currently used for causality,to extract implicit causality problem of boundary identification of fuzzy problems in-depth study aimed at these problems,the main work includes:(1)Construct the patent causal indicator list.In view of the difficulty in extracting implicit causality,this paper analyzes the characteristics of implicit causality in Patent Texts,puts forward the definition of implicit causality in Patent Texts,and extracts conjunctions that can represent implicit causality in patents,constructs seed vocabulary combined with conventional causality indicators,and expands synonym forest.(2)The relative position information and feature information of other words and demonstratives are extracted to solve the fuzzy problem of causal boundary recognition.(3)This paper proposes a bilstm model which integrates syntactic and dependency path information.In the patent text,the Related words and phrases may not be adjacent to each other,and there is a long-distance dependency problem.Bilstm is used to integrate syntax and dependency path information to solve the problem that the dependency characteristics of long-distance words disappear with sentence length.Syntactic and dependency path analysis provide the dependency or collocation relationship between words and distant words in tree structure.Through the introduction of attention mechanism,the weight ratio of word vector feature and syntactic feature is more reasonable,and CRF(conditional random field)is used to complete the final causal relationship recognition task.(4)In the summary and technical background,9836 sentences containing causality were extracted.There are 5827 patent implicit causal sentences.The F1 value is used as the evaluation index of extraction effect,and the same data set is used in CRF model,bilstm model,bilstm_crf model,as well as the fusion syntax and dependency path analysis model proposed in this paper.The experimental results show that in the corpus with an average sentence length of 36.2,the average F1 value of the model is 75.05%.Compared with other comparative experiments,the F1 value was significantly improved.
Keywords/Search Tags:patent, causality, syntactic analysis, attention mechanism, causal indicators
PDF Full Text Request
Related items