Font Size: a A A

Integration And Quantitative Analysis Of High-dimensional Biomedical Data Based On Multi-scale Networks

Posted on:2020-05-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:D J WangFull Text:PDF
GTID:1360330590453827Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
With the rapid development of high-throughput biotechnology,a large number of highdimensional heterogeneous omics datasets have been collected.In current biomedical research,there is a hot and difficult problem,which is how to integrate high-dimensional omics datasets from different sources and scales to analyze and study the law of life activities and the internal mechanism of complex diseases in complex biological systems.By integrating multi-dimensional heterogeneous high-throughput omics datasets,and applying new mathematical models and optimization algorithms,in this dissertation,we construct complex biological systems into multi-scale biological networked models.Based on constructed multi-scale biological networked models,using tensor computation,graph theory and statistic methods,we design some quantitative measures related to biological functions and dynamical processes from three different networked levels,including single static networks,multilayer networks and temporal multilayer networks,which are to solve some important scientific problems.The main innovative works of this dissertation are summarized as follows.1.Prediction of biological functions for gene isoforms based on mathematical modelling and quantitative analysis of co-expression networks.Through mathematical modeling and quantitative analysis of co-expression network,this dissertation mainly explores the following two important scientific problems:(i)For different isoforms encoded by the same gene,designing some quantitative measures to identify which isoforms are similar or significantly different in biological functions;(ii)Prediction and annotation of the biological functions for different isoforms encoded by the same gene.First,two new methods,referred to as MINet and RVNet,are proposed to reconstruct coexpression networks from exon-level expression data.Specifically,MINet method is a novel statistical hypothesis test based on the mutual information matrix of an isoform-isoform(or gene-gene)pair,and RVNet method is a novel application of the matrix RV-coefficient to quantitatively evaluate the correlation between any two genes or isoforms.Numerical experiments demonstrate that MINet method has the higher prediction accuracy when the sample size is sufficient and the exon number of two genes(or isoforms)differed greatly,while RVNet method has the best performance in the case of small samples.Furthermore,by comparing the advantages of two methods,we integrate our proposed two methods(MINet and RVNet)to a unified framework,referred to as the Iso-Net method,to infer co-expression networks.Second,using 109 gene isoforms of 12 important transcription factors in human myeloid differentiation as research targets,the Gene-Isoform co-expression networks are constructed by Iso-Net method.By defining quantitative metrics such as Jaccard similarity coefficient between nodes,in this dissertation,we identify 21 special gene isoforms in 7 transcription factors,which are significant differences in the co-expression relationship between them and other isoforms encoded by the same gene in corresponding cell lines.Based on the set of neighbors for each isoform in co-expression networks,the biological functions of each isoform are predicted by GO functional enrichment analysis.The research results of this dissertation present a novel framework to analyze and predict biological functions of gene isoforms,and provide a higher resolution method to study biological functions of genes for biomedical research.2.Identifying key nodes in multilayer networks under the framework of tensor computation.In this dissertation,using a fourth-order tensor to represent multilayer networks,we propose a new centrality measure,referred to as the Singular Vector of Tensor(SVT)centrality,which is used to quantitatively evaluate the importance of nodes connected by different types of links in multilayer networks.First,we present a novel iterative method to obtain four alternative metrics that can quantify the hub and authority scores of nodes and layers in multilayer networked systems.Moreover,we use the theory of multilinear algebra to prove that the four metrics converge to four singular vectors of the adjacency tensor of the multilayer network under reasonable conditions.Furthermore,a novel SVT centrality measure is obtained by integrating these four metrics.The experimental results demonstrate that the proposed method is a new centrality measure that significantly outperforms six other published centrality methods on two real-world multilayer networks related to complex diseases,i.e.,gastric and colon cancers.These research results present a novel centrality measure based on tensor computation,which provide new ideas and tools to explore the pathogenic genes of complex diseases and the screening of drug targets.3.Controllability and control energy of multi-scale networks.The theory of controllability has been proved to be widely used in complex biological networks.The study on the controllability of biological networks can reveal many key physiological or medical problems from a systematic perspective,such as the identification of drug targets,which is of great importance to the improvement of human life.In this dissertation,the controllability and control energy of multi-scale networks are studied from two different networked levels,including single-layer static networks and multilayer networks.The main theoretical analysis and numerical simulation results are summarized as follows.For single-layer static networks,using matrix algebra and graph theory,we firstly explore the estimation of control energy of complex networks.The theoretical results demonstrate that controlling unstable networks is easier than controlling stable networks with the same size.Numerical simulations reveal that the control energy cost has a negative correlation with the degree of nodes.More specifically,the combination of control nodes with the greater sum of degree requires the less energy to achieve complete control.Finally,based on the above results,we propose a multi-objective optimization model to obtain the control strategy,which not only ensures the fewer control nodes but also guarantees the less energy cost of control.For multilayer networks,we systematically study how the coupling strength and the connection patterns between different layers affect the controllability and control energy of multilayer networks from both theoretical and numerical aspects.First,combining theoretical derivation and numerical simulation analysis,it demonstrates that the coupling strength and control energy are approximately linear relationship,while the coupling strength and the controllability measure are a piecewise functional relation.Second,numerical experiment analysis reveal that,the HH(Highdegree-High-degree)interlayer connection pattern is the best choice for control energy,while is the worst option for controllability of multilayer networks.These results provide a methodology for selecting the coupling strength and coupling patterns to maximize the controllability and minimize the control energy cost.4.Mathematical modelling and quantitative analysis of temporal multilayer networks by integrated time and space scales.In the real life,most real-world and engineered systems are dynamically changing with time and spatial scales.To integrate different scales of multi-dimensional heterogeneous data to analyze the life activities of complex biological systems and the internal mechanism of complex diseases,in this dissertation,we develop a more general mathematical model,referred to as a temporal multilayer network,which explicitly incorporate time-dependence and multiple relationships of topological structures into a system,and provide a natural and reasonable description for real-world complex systems.Furthermore,using fifth-order tensorial framework to represent temporal multilayer networks,we propose several important topological metrics,including overlapping degree,entropy,degree correlation and link overlap,to quantitatively evaluate the temporal multilayer networks.In particular,based on the tensorial framework,we extend two famous iterative refinement centralities to temporal multilayer networks,referred to TM-eigenvector and TM-PageRank centralities,which are used to quantitatively evaluate the importance of nodes in real-world complex systems.What is more,we use the theory of multilinear algebra and matrix analysis to strictly prove the convergence of these iterative algorithms.The above measures are applied to two real temporal multilayer biological networks,and the numerical experiments reveal that our proposed the centrality methods have higher prediction accuracy,resolution and convergence rate.
Keywords/Search Tags:Biomedicine, Multi-dimensional Omics Data, Multi-scale Networks, Gene Isoforms, Network Control, Network Centrality
PDF Full Text Request
Related items