Font Size: a A A

Extraction Of Chemical-protein Interaction Relationships From Biomedical Texts Based On Deep Learning

Posted on:2023-07-04Degree:MasterType:Thesis
Country:ChinaCandidate:M Y LiFull Text:PDF
GTID:2544306842970189Subject:Master of Science in Biology and Medicine (Professional Degree)
Abstract/Summary:PDF Full Text Request
A large number of chemical-protein interaction(CPI)relationships are hidden in the biomedical literature,and the relationships between these biological entities play an important role in drug discovery,clinical medicine,and the construction of structured biomedical databases.However,manual extraction of CPI is expensive and timeconsuming,so it is an important and valuable task to automatically extract CPI from biomedical literature by applying Natural Language Processing(NLP)technology.This task has considerable application value,and there is still much room for improvement.First,this study selects three representative pre-trained language models in the NLP field,and compares their performance on the CPI task through experiments.Aiming at the common problem of less corpus in relation extraction of biomedical texts,based on these various language models that have been pre-trained on external large corpora,this study uses CPI task corpus to fine-tune them to obtain better text feature representation.By comparing the effects of pre-trained language models Word2 Vec,ELMo and BERT on the performance of CPI tasks,it is found that the pre-trained BERT can greatly improve the performance of CPI relationship extraction after fine-tuning.Then,this study proposes a new neural network-based multi-classification model,which comprehensively utilizes textual semantic and syntactic information for CPI extraction.First,the pre-trained BERT model is fine-tuned on the Chem Prot corpus to learn deep contextual representations of sentences.Then,the obtained context representation is input into the Bi-directional Long Short-Term Memory Neural Network(Bi-LSTM),and the semantic features of entity pairs are encoded for the text combined with the multi-head attention mechanism.On the other hand,the Shortest Dependency Path(SDP)between entity pairs in the sentence is extracted,and the word sequence on the SDP is used as the input of the Convolutional Neural Network(CNN)model to learn the syntactic features of entity pairs.Finally,the semantic and syntactic features are combined to obtain the deep features of the samples,and the classification function is used to predict the CPI relationship.Experiments on Chem Prot corpus show that the proposed model achieves an F1 score of 0.773,which is significantly higher than existing state-of-the-art methods.Finally,this study designs and implements an interactive CPI relation extraction system and provides Web access.The system can retrieve the literature that contains the specified chemical-protein entity pair in Pub Med,and can also predict the CPI relationship types of entity pairs,which provides convenience for relevant researchers in data processing.
Keywords/Search Tags:Chemical-protein Interaction(CPI), Pre-trained Language Model, Text Representation, Relation Extraction
PDF Full Text Request
Related items