Font Size: a A A

Research On The Inference Of Gene Regulatory Networks Modeled With Structural Equation Models

Posted on:2021-01-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:1360330611471890Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In living organisms,genes are the basic unit to control biological traits,they usually do not work in isolation,but may interact with each other.In the process of gene expression,the interactions between genes constitute intricate Gene Regulatory Networks(GRNs).The inference of GRN is very important for discovering gene functions and understanding the regulatory mechanism at the molecular level.Gene expression data is often high-dimensional and the sample size is usually small.Therefore,when inferring the structure of a GRN,it is impractical to use experimental methods to test the interaction between each pair of genes.Therefore,a computational model is generally used to model the GRN first,and then the model is learned and inferred through appropriate machine learning algorithms.Two important issues are raised accordingly,namely the selection of the model and the corresponding parameter estimation method.At present,a variety of different parametric models are widely used in GRN inference,most of which only exploit gene expression data to infer the network structure.In fact,some experimentally generated or naturally occurring genetic perturbations may also affect the GRNs.If taking these genetic perturbations into account,the accuracy of the GRN inference could be further improved.The Structural Equation Model(SEM)provides a systematic framework that can easily integrate gene expression data and genetic perturbations into one model to jointly infer the GRN structure,and has become one of the most promising GRN modeling methods.In this paper,GRNs are mainly modeled with SEMs,and the study is carried out from two aspects,that are the inference of GRNs under a single condition and the joint inference of GRNs under two different conditions.Several novel optimization computational methods are proposed to solve the corresponding parameter estimation problems,so as to infer the network structure of GRNs as accurately as possible and provide solutions to the related biological problems at the gene level.The main research efforts and contributions of this paper are summarized as follows:1.The modelling and inference of GRNs under a single conditionSome biological evidence suggests that while there are a large number of genes in an organism,a gene usually regulate or is regulated by a small number of genes,that is to say,GRNs and more general biological networks are sparse.Therefore,when inferring the structure of GRNs based on various calculation models,general parameter estimation methods cannot be directly applied,specific algorithms applicable to gene expression data need to be designed.1)This paper first provides a comprehensive review and summary of the commonly used GRN parametric models and related inference algorithms.For each model,the parametric approach when modeling GRNs is first described,followed by a brief introduction to some of the classical sparse inference algorithms based on the model,finally,a comparative summary of the advantages and disadvantages of each parametric model when used to infer GRNs is presented.2)An inference algorithm for SEM-based GRNs under a single condition named BaNEG is proposed.The algorithm firstly integrates the gene expression matrix and gene perturbations through a simple re-parameterization process to form a model with a linear regression form;then,a Bayesian inference algorithm based on the NEG(Normal-Exponential-Gamma)hierarchical prior is used for sparse estimation of the reconstructed model;finally,the proposed BaNEG algorithm is verified to achieve high accuracy in sparse SEM inference problems through simulations on synthetic data set with different settings.2.The modelling and joint inference of GRNs modeled with SEMs under two different conditionsFor the same gene set,under different conditions(e.g.,different environments,different tissue types or disease states,etc.),the expression level of each gene may change slightly and the GRN structure on corresponding condition may vary somewhat.If the GRN structures under the two conditions are inferred independently,although the difference between the GRNs is considered,it is easy to ignore the correlations and similarities between them,and thus the problem of joint inference of differential GRNs begins to raise concerns.In this paper,three different joint inference algorithms are proposed for SEM-based GRNs under different conditions.1)A joint GRN inference method based on the proximal gradient optimization algorithm named DiffSSEM is proposed.Model two GRNs under different conditions with two SEMs,an SEM-based GRN joint inference optimization model is constructed firstly by considering the sparsity of each GRN and the similarity of the two GRN structures;then,with simple model transformations,the joint inference optimization model can be transformed to a convex optimization model,which can be solved by a convex optimization algorithm(proximal gradient algorithm)to infer the structures of the two GRNs and the different GRN;finally,the proposed DiffSSEM algorithm is compared with a naive algorithm that independently infer the GRN structures under different conditions,the result show that this joint inference method has significantly better network inference performance than the plain method..2)A GRN joint inference algorithm based on model re-parameterization named ReBDA is proposed,which focuses on the re-parameterization process of two pairwise SEMs.Firstly,a new re-parameterization method is proposed to integrate the two SEMs into a single SEM containing all the information,where the parameters are combined from the separate GRN matrix and the differential GRN matrix;then a suitable Bayesian sparse SEM inference algorithm is directly applied to the re-parameterized model to infer the sparse GRN matrix and differential GRN matrix simultaneously;Finally,the proposed ReBDA algorithm is compared experimentally with the ReDNet algorithm,a GRN joint inference algorithm based on a re-parameterization approach,which was presented at the UAI(Uncertainty in Artificial Intelligence)conference in 2018,and the results show that the ReBDA algorithm has better network inference performance.3)A differential GRN joint inference algorithm based on Bayesian fused prior named BFDSEM is proposed.Firstly,the original two pairwise SEMs are integrated into a linear regression model through model re-parameterization;then,based on the GRN joint inference optimization model which is constructed by considering the sparsity of individual GRNs and differential GRN,a Bayesian fused hierarchical prior is proposed for the linear model to achieve the same result,and the corresponding conditional posteriori distribution is deduced in conjunction with the likelihood distribution,the parameters can be estimated using Gibbs sampling method;Finally,compare the BFDSEM algorithm to the ReDNet algorithm and the FSSEM algorithm published in 2019 in Bioinformatics,a top journal in bioinformatics,the results show that the proposed BFDSEM algorithm has a significant improvement over ReDNet in terms of network inference performance,while it is comparable to the current best performing FSSEM algorithm.
Keywords/Search Tags:Gene regulatory networks, structural equation models, gene expression data, genetic perturbations, sparsity, joint inference
PDF Full Text Request
Related items