| Objective: Drug repositioning is an effective drug treatment method corresponding to the outbreak of sudden epidemic diseases.Massive medical data provides a solid data base for drug knowledge discovery.How to effectively integrate multi-source heterogeneous medical data to achieve multi-angle knowledge mining is a difficult problem to be overcome in drug knowledge discovery.In order to effectively fuse multi-source heterogeneous medical data,a knowledge graph construction framework based on multi-source data(KGCF-MD)is proposed.Based on KGCF-MD,the COVID-19 knowledge graph(Cov KG)is constructed by integrating medical literatures and biological databases.Then,we conducted COVID-19 drug repositioning to provide information support for the treatment drug screening of COVID-19.Methods:(1)Cov KG construction.First,we obtained medical literatures in Pub Med and knowledge associations of chemicals,genes and diseases in CTD and Dis Ge Net.Next,the triples were extracted from Pub Med,CTD and Dis Ge Net.We defined attributes including entity labels,document frequency and so on.We filtered triples from three aspects : entity,relation and attribute.In order to obtain higher knowledge associations of COVID-19,we filtered the relationship type,filtered the COVID-19 related diseases such as respiratory tract infection based on the Me SH and so on.Then,entity alignment was based on authoritative medical vocabulary.Based on the UMLS semantic relationship type,we standardized the relationship type,and integrate attributes to achieve multi-angle knowledge fusion.Finally,the NEO4 J graph database and My SQL database were used to store the fused triplets,and the COVID-19 knowledge graph Cov KG was constructed and visualized.(2)COVID-19 drug repositioning.Based on the Cov KG,knowledge associations about drugs,genes,and diseases were used to construct training set.Trans E,Compl Ex,Dist Mult and Rotat E were used for pre-training.MRR and Hits@n indicators were used to evaluate model performance.The best performance model was selected to predict the candidate therapeutic drugs for COVID-19.And the link prediction algorithms was used to predict the candidate therapeutic drugs of COVID-19 by using CN,AA,RA and PA.Finally,the intersection of the drug prediction results of the two methods was taken as a potential therapeutic drug for COVID-19.Based on the Pub Med literatures,the candidate therapeutic drugs for COVID-19 were analyzed to explore the potential associations between various drugs and COVID-19.Results:Cov KG containing 118,036 medical entities and 3,317,978 triples was constructed,involving 9 medical concepts such as genes,diseases,drugs,and anatomy,34 semantic relationships such as treat,causes,and stimulate,and 6 attributes such as entity labels and document frequency.Based on the Cov KG,we used graph embedding models for drug prediction training.The results showed that the Rotat E had the best prediction effect(Hits@n = 0.49).Rotat E combined with link prediction algorithms to predict drugs.Finally,a total of 29 drugs related to COVID-19 were found.According to the pharmacological effects of the drugs,29 drugs were divided into 5 categories :enzyme inhibitors,hormone drugs,anti-allergic drugs,drugs for different systemic diseases and other drugs.Conclusion: In this study,we proposes a multi-source data knowledge graph construction framework KGCF-MD that integrates medical texts and structured data,aiming to provide technical reference for multi-source heterogeneous data fusion.With the medical vocabularies as the standard,we integrate multi-source heterogeneous data to construct a COVID-19 knowledge graph Cov KG,which lays a solid data foundation for the downstream application research of COVID-19.At the same time,based on the Cov KG,our takes COVID-19 drug repositioning as an case study,and combines graph embedding models and link prediction algorithms to predict COVID-19 potential therapeutic drugs.Drug prediction analysis shows that it is feasible to use this method to study drug repositioning,which can provide decision support for preclinical drug screening. |