Font Size: a A A

Biomedical Entity Relationship Extraction And Application Research Based On The Multi-view Model L

Posted on:2023-10-30Degree:MasterType:Thesis
Country:ChinaCandidate:L QinFull Text:PDF
GTID:2530307118996349Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The increasing emphasis on drug development,disease treatment,and medical database construction in recent years has led to the rapid development of bioinformatics,resulting in an exponential increase in the number of medical research texts generated each year.Mining and analyzing the valuable knowledge in these texts can greatly advance the development of the biomedical field.As an important branch of text mining techniques,entity relationship extraction can automatically and efficiently extract structured knowledge from these unstructured texts.The existence of natural language text features inherent to the domain such as complex entity structure and long sentences in biomedical domain,coupled with the small number of annotated datasets in this domain,makes it difficult to fully learn the semantic and syntactic information of sentences from these contextually complex texts,and thus unable to accurately mine the valuable structured information.In this paper,we first investigate efficient contextual sequence feature learning models to better understand the semantic information of texts,and use this model as a basis to study the named entity recognition(a direct underlying task for entity relationship extraction)algorithm introduced into the Machine Reading Comprehension(MRC)framework to recognize entities in texts,and then combine this Contextual sequence feature learning model and dependent syntax feature learning model are combined to propose a biomedical entity relationship extraction model based on multi-view feature fusion,and finally a compound-protein knowledge base and visual query system are constructed by combining the requirements of practical scenarios.Specifically,the main research of this paper includes the following aspects.Firstly,research on the contextual sequence feature learning method based on pretrained models.For the problem that the annotated dataset in biomedical domain is small and the existing models including some pre-trained models do not learn enough sequence feature information on specific tasks,the Chemical BERT model with secondary pre-training on a large number of domain-specific texts is proposed.The model is equipped with more domain feature information based on the original pretrained model,and the experimental results show that the model exhibits excellent text sequence feature learning ability after fine-tuning on different tasks related to compounds,and obtains higher F1 values on several public datasets for entity recognition and relationship extraction.Secondly,research on named entity recognition model with the introduction of MRC task,a model Chemical BERT-MRC.Aiming at the problem of complex entity structure and a large number of overlapping entities in biomedical field,by analyzing the structural characteristics of overlapping entities,this paper investigates the named entity recognition model introducing MRC task based on Chemical BERT model to better identify nested entity boundaries.In addition,it is found that introducing prior knowledge for the model input can lead to better recognition results.The highest F1 values are obtained on several publicly available datasets,which provide a reliable foundation for the relationship extraction task.Thirdly,research on a multi-view relational extraction model incorporating sentence contextual sequence information and dependency syntax information.To address the problem of long sentences and complex contexts in the biomedical field,we first design a graph neural network based on a multi-headed attention mechanism to capture the dependency syntax information between more distant entities.The network enables "neighbor" nodes with dependency relationship to fuse information with each other,and after multi-layer iteration,it can solve the problem that nodes cannot perceive more distant entities.Combined with the Chemical BERT model,a multi-view feature learning model with integrated feature learning capability is designed.Comparative experiments on public datasets show that the model can focus more adequately on the information features of different subspaces of sentences and achieve the highest F1 value.Finally,compound-protein knowledge base and visual query system construction.In this paper,we first collect a large amount of biomedical literature from Pub Med,and then extract the structured knowledge of large compound-protein interactions from this literature based on the above study as a knowledge base to build and establish a visual query system for this knowledge base.The system also provides a knowledge update interface for the knowledge base to ensure the timeliness and usability of the knowledge base.The entity relationship extraction model based on multi-view feature fusion proposed in this paper is important for structured information extraction and knowledge discovery of natural texts in the biomedical field,especially for downstream tasks such as drug target identification and drug discovery with a certain degree of assistance.
Keywords/Search Tags:Multi-view feature fusion, Relationship extraction, Nested entity recognition, Multi-headed attention mechanism, Knowledge base construction
PDF Full Text Request
Related items