Font Size: a A A

Research On Multi-source Biomedical Knowledge Fusion For Drug-Target Relation Prediction

Posted on:2022-11-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:X ZhuFull Text:PDF
GTID:1480306758475284Subject:Medical informatics
Abstract/Summary:PDF Full Text Request
The prediction of drug-target interaction(DTI)relation is very important for drug development,but a single data source can no longer meet research needs.How to fuse multi-source biomedical data and realize the discovery of new drug-target relation is a hot and difficult point of current research.Knowledge fusion is an effective way developed in recent years to integrate multi-source heterogeneous data and discover new knowledge.Knowledge fusion provides new ideas for drug-target relation prediction based on multi-source biomedical data.From the relevant research results at home and abroad,knowledge fusion has problems that need to be studied and resolved in fusion framework,fusion method and application.In addition,current methods for predicting drug-target relations also have shortcomings.Therefore,this research aims at the problems in current research to carry out multi-source biomedical knowledge fusion research for drug-target relation prediction,and proposes a set of knowledge fusion methods that can effectively fuse multi-source biomedical data to realize drugtarget relation prediction.By sorting out the current research status at home and abroad,this study believes that there are the following problems in the current research:(1)The concept of knowledge fusion is used in confusion,and needs to be defined.Due to the particularity of the biomedical field,the current knowledge fusion framework cannot be directly applied to drug-target relation prediction.(2)Although the knowledge network can structurally display entities and relations in multi-source heterogeneous data,it is not enough to reveal the complex semantic relations between biomedical entities.Knowledge fusion methods inherited from traditional information fusion algorithms are low in efficiency and adaptability.(3)In the prediction of drug-target relation,the multiple semantic information existing between drugs and proteins is not considered comprehensively,and the feature dimension used in the study is relatively single.In addition,network analysis and machine learning have their own advantages,but the joint application of the two methods is less studied.(4)Most of the current knowledge fusion research stays at the level of theoretical framework,and the data scale in the few practical studies is small.The applied research of knowledge fusion in the field of biomedicine needs to be further strengthened.In response to the above problems,this research first defines the connotation of knowledge fusion from three aspects: "data-information-knowledge","knowledge integration-knowledge aggregation-knowledge fusion" and "data fusion-information fusion-knowledge fusion",and by summarizing the various definitions proposed by the predecessors,the definition of knowledge fusion in this study is finally proposed.Then based on DIKW(Data-Information-Knowledge-Wisdom)hierarchy theory,knowledge network,similarity calculation,Meta-path and machine learning,we build a knowledge fusion framework consisting of basic data layer,association fusion layer,feature fusion layer,decision fusion layer,theoretical method layer and service application layer,and carry out the next method research based on this framework.This study proposes a multi-source biomedical knowledge fusion method for drugtarget relation prediction.The core contents of this method mainly include the following three,which correspond to the three fusion levels of association-level fusion,featurelevel fusion and decision-level fusion:(1)Construction of a biomedical knowledge network fused with multi-source data: First,construct a biomedical knowledge network model containing 4 node types(drug,protein,disease and side effect)and 6 relation types(drug-protein,drug-drug,drug-disease,drug-side-effects,protein-disease and protein-protein).Through comparative analysis of 25 related biomedical databases,Drug Bank,HPRD,CTD and SIDER databases were finally selected to obtain entities and relations,and then the matrix was used to realize the link of biomedical entities.The constructed knowledge network contains a total of 12015 nodes and 1895445 edges.Finally,Cytoscape and VOSviewer are used to visualize the constructed biomedical knowledge network.(2)Construction of drug and protein similarity network fused with multiple semantics: First,a drug similarity calculation method fused with disease semantics was proposed based on Disease Ontology(DO)and Mass Diffusion(MD).Using the Random Walk with Restart(RWR),Jaccard,Tanimoto and Smith-Waterman algorithms to calculate the similarity between multiple semantic relations between drugs and proteins,and build a drug similarity network and protein similarity network fused with multiple semantic relations.(3)Building a relation prediction model fused with network analysis and machine learning: First,the meta-path and Hete Sim algorithm are used to calculate the drug-target semantic similarity,and then the 21-dimensional Hete Sim features of the drug-target are obtained based on the global heterogeneous network.Then use XGBoost,Random Forest(RF)and Support Vector Machine(SVM)to build a relation prediction model to judge whether there is a potential association between the drug and the target.This study draws the following conclusions:(1)In terms of theory,the connotation of knowledge fusion is analyzed,and the definition of knowledge fusion is proposed.This study believes that knowledge fusion includes data fusion and information fusion,and the ability to generate new knowledge is the symbolic feature of knowledge fusion.(2)In terms of methods,the biomedical knowledge fusion method for drug-target relation prediction proposed in this study is effective and advanced.First,a drug similarity calculation method(DSFDS)incorporating disease semantics is proposed.Experiments show that the drug similarity network obtained based on this method has better performance in the task of drug-target relation prediction.Second,a drug and protein similarity network that fuses multiple semantics is constructed.It is proved by social network analysis that the drug similarity network and protein similarity network after relation fusion have better effect.Finally,a drug-target relation prediction model is constructed by fusing network analysis and machine learning.Experiments show that this method is superior to the comparison method in all evaluation indicators.And among the three machine learning algorithms,XGBoost has better results than random forest and support vector machine.(3)In terms of applications,new drug-target relations have been discovered using this method,and many of the predicted results can be supported by previously known experimental or clinical evidence in the literature.For example,the literature confirms that there is indeed a relation between clozapine and GABA receptor protein,and between mexiergot and 5-HT1B/1D.In addition,there may be an interaction relation between ziprasidone and 5-HT2 B,telmisartan and prostaglandin G/H synthase 1/2,tamsulosin and C-X-C chemokine receptor 1.These results can provide reference for researchers to carry out experimental research.The innovations of this research are:(1)The concept of knowledge fusion is analyzed from multiple perspectives,and the connotation of knowledge fusion is clarified.A multi-source biomedical knowledge fusion framework for drug-target relation prediction is constructed.This enriches the relevant theoretical achievements of knowledge fusion research and provides a certain theoretical reference for future knowledge fusion research.(2)Based on DO and MD model,a drug similarity calculation method(DSFDS)integrating disease semantic information is proposed.And based on the multiple semantic relations between drugs and proteins,multiple similarity networks are constructed and these similarity networks are fused.It further enriches the relations in the constructed biomedical knowledge network,improves the data objectivity and integrity of the biomedical knowledge network,and provides more diverse information and multi-faceted perspectives for the drug-target relation prediction task.(3)Fusing network analysis and machine learning methods,making full use of the topological properties of heterogeneous knowledge networks,and based on the idea of meta-paths,the 21-dimensional Hete Sim features of drug-targets are obtained in the global network from the semantic level,which can reveal the entity semantic features more comprehensively,so that the relation prediction model based on machine learning algorithm can obtain better effect.
Keywords/Search Tags:Knowledge fusion, Knowledge network, Semantic similarity, Network analysis, Machine learning, Drug-target relation prediction
PDF Full Text Request
Related items