Font Size: a A A

Research On Ontology Based Method For Mining Disease Molecular Markers

Posted on:2021-05-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y T WangFull Text:PDF
GTID:1480306569983449Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the continuous growth of biomedical data,humans have a deeper understanding of diseases.Quantifying disease similarity and identifying disease-related biomarkers are very important for explaining the pathogenic mechanism and prognosis of diseases.Due to the diversity and heterogeneity of disease-related data,how to analyze and mine disease molecular markers efficiently is a challenging job.In this paper,the methods for mining disease related molecular markers are discussed and studied.The main work includes the following four aspects.(1)We propose a method for disease similarity calculation based on functional network fusion.It is a common measure to calculate the similarity of diseases by functional association of genes.However,in the functional association network of genes,a target gene is not only associated with its adjacent genes,but also indirectly linked with other genes.The fact is that the direct links between genes are usually considered and the indirect links are ignored.Meanwhile,compared with fused multiple functional association networks,a single functional association network has limitations in data volume and construction methods.Therefore,we propose a disease similarity algorithm based on gene functional association network fusion.Firstly,disease-related genes are annotated by disease ontology terms.Then the weights between genes in multiple gene functional association networks are recalculated by a global optimization algorithm.Finally,the comprehensive similarity between diseases is calculated based on disease-related gene annotations and the disease ontology semantic structure.The experimental results show that compared with the existing methods,the disease similarity algorithm based on functional network fusion can further improve the accuracy of the disease similarity calculation.(2)We propose a computational model to identify potential disease-related metabolites based on disease similarity and scores of referred literatures between metabolites.At present,a large number of associations between diseases and metabolites have been found in biochemical experiments.However,compared with the total amount of diseases or metabolites,the number of associations between diseases and metabolites is relatively small.Therefore,data sparsity occurs so often in the prediction of disease-related metabolites that the potential associations between disease and metabolites cannot be accurately identified.Firstly,we build a disease vocabulary by medical subject terms and disease terms in Disease Ontology for the mappings between disease terms and metabolites.Then,the metabolite association network is established with the metabolite similarity as the weight,which are calculated by both the similarity between diseases and the referred scores of metabolite-related literatures.Finally,the potential associations between metabolites and disease are identified based on a hybrid recommendation model.In this paper,19 diseases are selected to validate computational model based on the differences between the databases,and the results show that the computational model is proven to be successful in predicting potential disease related metabolic signatures.(3)We propose a computational model for nc RNA-disease association prediction based on multiple biological datasets.In the study of the association between disease and nc RNA,disease similarity is often involved to calculate nc RNA similarity as a reference factor.But the reference factors of nc RNA similarity calculation don't include only disease,but also other information.The associations between nc RNAs can be more accurately quantified if multiple features associated with potential biomarkers are utilized comprehensively.Firstly,we name nc RNAs uniformly.According to the characteristics of different data sources,we define the similarity of lnc RNA and circ RNA respectively,and construct a multilayer heterogeneous network composed of diseases and nc RNAs by the mapping between them.Finally,we prioritize candidate disease-related nc RNAs based on the topology of this multilayer network.(4)We propose a model for disease related biomarker prediction based on the knowledge graph.Disease related information in the disease knowledge graph often contains only one disease related feature.It will reduce the ability to describe diseases even if the disease-related characteristics can be studied specifically.After collecting diseaserelated data sources,we integrate the relationship between entities and build the disease knowledge graph by extraction of disease related information,terminology annotation and knowledge combination.Analyzing the characteristics of disease-related information in the knowledge graph,we make these disease and biomarker nodes potentially represented based on the architecture of Graph Auto-Encoders.The potential associations between diseases and biomarkers is predicted and visual annotation of disease-related knowledge is realized.It is helpful to understand the pathogenesis of diseases.
Keywords/Search Tags:disease ontology, disease biomarker, multi-omics, disease similarity, similarity network
PDF Full Text Request
Related items