Font Size: a A A

Complex Disease Characterization Based On Sample-specific Network

Posted on:2022-05-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y H HuangFull Text:PDF
GTID:2480306311450504Subject:Operational Research and Cybernetics
Abstract/Summary:PDF Full Text Request
Biological network is a way of characterizing complex biological systems,the nodes of the network represent biological molecules,and the edges of the network represent the interaction or regulation relationship between molecules.In recent years,with the accumulation of biological big data and the progress of data science,the application of biological network in life science research plays more and more role.Many studies expanded the focus identifying differentially expressed genes to identifying differential network module,from identifying pathogenic genes to identifying pathogenic network module.However,so far,the research on the construction of single-sample specific networks for individual is very limited.Meanwhile,specific network of a single sample is crucial to elucidate the molecular mechanisms of individual diseases and understand the tumor heterogeneity.Therefore,it is urgent to establish and apply bioinformatics theory to study this core issue of bioinformatics.At present,the biological networks are almost aggregated network for a group of samples,which cannot reflect the specific pathogenic mechanism of individual patient and have many edges with false positive.For this problem,the paper focuses on an individual patient and constructs sample-specific network for a single sample.The core of the sample-specific network is to quantify the statistical perturbation of a sample again a group of samples.For each kind of cancer,a set of refence samples is needed,the reference network is constructed based on the partial correlation coefficients.Then,a new sample is added to the reference samples to form a new combined sample,the perturbated network is also constructed based on the partial correlation coefficients.Finally,the reference network and the perturbated network are differentiated to obtain the differential network.The key to the sample-specific network is to screen the significant differential edges in the differential network,that is,to screen the significant differential partial correlation coefficient.In this study,it is proved that the differential partial correlation coefficient approximately obeys Normal distribution through mathematical derivation and statistical inference,so the z-test can be used to test whether the differential partial correlation coefficient is significant,finally sample-specific network for an individual can be obtained.We validate the biological significance of the sample-specific network,then based on the sample-specific network,the main problems addressed in this paper include,First,according to the similarity of each sample-specific network,a new distance,network distance,is defined,then a clustering model is designed based on the network distance.By applying the clustering model to analyze tumor data from the Cancer Genome Atlas and single cell data,we validated the effectiveness of sample-specific network in identifying subtypes,distinguishing different types of cancer and classifying single cells.Second,we propose a novel method for identifying individual driver genes in this paper.The more differentially expressed genes regulated by a gene,the more likely this gene is individual driver gene.In this study,the known cancer-related genes were used to enrich the predicted top10,top 15 and top 20 driver genes for a patient.The results show that the cancer driver gene predicted are reasonable.Thirdly,we obtain the driver genes of a specific cancer by calculating the top10 potential driver genes of each patient.To benchmark our method,we consider nine other methods of identifying driver genes,it is concluded that our method performs better than the nine tested method.The key point of this paper is building a sample-specific network based on a single sample,a series of biological problem were solved by this network.The reliability and feasibility of the sample-specific network were proved in theory and application.
Keywords/Search Tags:Differential partial correlation coefficient, Sample-specific network, Network distance, Driver gene
PDF Full Text Request
Related items