Font Size: a A A

Research On Relation Extraction Method In Financial Domain

Posted on:2023-05-06Degree:MasterType:Thesis
Country:ChinaCandidate:R T GuoFull Text:PDF
GTID:2568307103494624Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,the digitalization process in the financial field has accelerated,and a large amount of data information has been generated,most of which are unstructured text data.How to use financial text data accurately and efficiently has become a key issue.Relation Extraction task aims to extract structured triple information from unstructured text data,which is helpful to accurately analyze massive text data in the financial field,and has received extensive attention.However,there are the following problems in the application of existing relation extraction models in the financial field:(1)The deep learning-based relation extraction model requires a large amount of training data,and the acquisition of manually labeled data in the financial field requires domain expert knowledge,and the cost of labeling is high.,resulting in limited model training effect.(2)Compared with general texts,financial texts have longer sentence lengths,which contains more irrelevant text.The dependencied between entities are longer and more complex,resulting in higher computational costs for existing models.(3)Financial entities are crucial to the representation of textual relations,and existing models do not fully learn entity contextual information and entity semantic information.These problems limit the application of existing relation extraction models to financial texts.In order to solve the problems mentioned above,this thesis proposes a data augmentation method for relation extraction tasks in the financial field and a network model ESAN(Joint Entity and Syntax Attention Network for relation extraction)that combines syntax and entity information.The contributions are as followed:(1)In the view of the high cost of obtaining labeled data in the financial field,this thesis adopts a method based on rewriting to generate texts with the same semantics and different expressions,and uses entity masks to ensure that the generated texts contain entity expressions.Adversarial training is used to fully annotate samples.down noise reduction.The experimental results show that the proposed data augmentation method can improve the performance of the model in small-scale data scenarios,which is better than that of other augmentation methods,and the adversarial training method can reduce the negative impact of noise data on the model in fully labeled scenarios.(2)In ESAN,in view of the high computational cost of the model in financial texts,this thesis builds a syntax-aware attention layer based on the dependency syntax,using the shortest dependency path and dependency distance between entities,which not only retains important structures related to entities in the text information,and reduces the computational complexity of the syntax-aware attention layer and reduces the computational cost of the model.(3)In order to fully learn entity-related semantic information,ESAN uses entity type attention layer to enhance entity representation,enrich entity knowledge,and use entity local attention layer to obtain text representation of fused entities,and guide the model to pay attention to entity-related semantic information.The experimental results show that the ESAN model shows superior performance on the public financial relationship extraction dataset Fin RE,and Micro-F1 is at least 2.51% higher than other comparison models,proving the effectiveness of the model.Finally,this thesis takes the entity relation in the investment banking field as an example to carry out the practical application and visual display of relationship extraction.
Keywords/Search Tags:relation extraction, financial text, dependency structure, attention mechanism, data augmentation
PDF Full Text Request
Related items