Font Size: a A A

Inference And Analysis Of Regulatory Networks For Multi-dimensional Biomedical Data

Posted on:2020-02-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:A J FanFull Text:PDF
GTID:1360330590953827Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
With the development of high-throughput technology,a large number of genomewide SNP data,proteomics,epigenome,transcriptome,metabolite group and other high-dimensional multi-group biomedical data are generated during the experiment.These data have the characteristic of small sample with high dimension.The large amount of high-dimensional multi-group biomedical data provides an opportunity to study and reveal the interaction between genes and their products.Network plays an important role in data mining,and the system biology method based on network has become a powerful tool to study the complex behavior of biological systems.In this dissertation,based on high-dimensional multi-group biomedical data,a model is established and an intelligent optimization algorithm is proposed to study the regulation mechanism between genes from two different network levels,namely large-scale gene regulatory network and multi-layer regulatory network.Therefore,this dissertation focuses on the research and analysis of the construction of regulatory network based on the high-dimensional multi-group student biomedical data.The main research contents include the following three aspects.1.For the high-dimensional time-series gene expression data,a novel algorithm based on random singular value decomposition to infer large-scale gene regulation network is proposed.The high-dimensional time-series gene expression data has the characteristics of high dimension,few time points and noise.In order to solve this problem,an algorithm based on singular value decomposition is proposed to construct a large-scale gene regulatory network.At first,the regulative relationship between genes is described by ordinary differential equation model,and the problem of constructing gene regulatory network is transformed into an optimization problem of estimating model parameters.Then,by combining the time-series gene expression data with a gaussian matrix,the data noise can be smoothed and the noise can be reduced.Finally,a random strategy is introduced for proposed algorithm to reduce the dimensionality of high-dimensional gene expression data.In general,only a few genes in gene expression data play an important role in gene regulatory networks.The random strategy can randomly select some genes in the data to infer the gene regulatory network,rather than all the gene data.Compared with the original high-dimensional data,the data after dimensionality reduction is easier to construct an effective gene regulatory network.At the same time,it is easy to introduce too many false positives in the construction of gene regulatory network.We introduce an iterative strategy for proposed algorithm to improve the precision of the construction of gene regulatory network and reduce the false positives of the network.Research shows that biological networks are usually very sparse.In order to evaluate the gene regulatory network accurately and effectively,two new network evaluation indexes,expectation precision and expectation error,are proposed.Due to the high sparsity of the gene regulatory network,the number of regulated edges in the gene regulation network is much smaller than the size of the network.As a result,the number of false positives and false negatives,and the number of true positives and true negatives are not on the same order of magnitude.For the sparse gene regulatory network,the high sparsity of the network is taken into account in the index of the expected precision and the expected error.The results of numerical experiments show that the proposed two new indexes are more reasonable to evaluate networks than the accuracy and error rate.In order to verify the performance of the proposed algorithm,the proposed algorithm is tested on four well-known benchmark data sets of Dialogue for Reverse Engineering Assessments and Methods challenge(DREAM).The experimental results show that a high-precision and sparse gene regulatory network can be constructed by a large-scale gene regulatory network algorithm based on singular value decomposition for high-dimensional timeseries gene expression data.2.According to transcriptome data of respiratory syncytial virus infection in different experimental settings,gene regulatory network is constructed and key modules of the network are identified by combining database information and optimization algorithm.Vaccination against respiratory syncytial virus infection is likely to cause vaccine enhanced disease.In order to study the mechanism of vaccination enhanced diseases,based on the transcriptome data of respiratory syncytial virus in different experimental settings,an optimization algorithm combining existing databases is proposed to construct gene regulatory networks under different experimental conditions.In order to reduce the dimension of the experimental data,Fold Change and t-test are used to screen the differentially expressed genes in the respiratory syncytial virus data when constructing the network.Our study have shown that the scale of gene regulatory networks is too large to be understood by observational and descriptive methods.Cluster-one,a module detection algorithm,is used to identify important modules in the gene regulatory network and reduce the size of the gene regulatory network.The gene regulatory network under each experimental setting can identify multiple key modules in which genes often participate in the same biological process or have similar biological functions.Under different experimental conditions,we not only need to pick out the modules with high similarity,but also want to pick out the modules with large diversity.In order to select the modules with large differences in different experimental conditions or at different time points,we propose a module difference measurement index,difference degree of network for module.According to the difference degree of network for modules,modules with large difference under different experimental conditions can be selected.According to the difference degree of network,two modules of respiratory syncytial virus infection network are selected.To study the relationship between the biological components within the modules,we use the DAVID biological information database to annotate the biological functions of the two identified modules.The results of KEGG pathway analysis and GO functional enrichment analysis show that most of the genes in the module are clustered in immune-related biological processes and pathways.This suggest that the selected modules are associated with enhanced immunity of respiratory syncytial virus.3.Aiming at multi-group biomedical data,a multi-layer regulatory network is constructed to integrate multiple groups biomedical data.With the development of high-throughput technology,a large number of different types of omics data are generated in the biological experiment process.By integrating multi-group data,the regulatory relationship among genes can be mined more comprehensively and accurately based on constructing multi-layer regulatory network.First,a set of ordinary differential equation models is used to describe the multi-layer regulatory network,and the problem of integrating multi-group data to construct the multi-layer regulatory network is transformed into an optimization problem of estimation model parameters.Secondly,we propose a recursive regularization algorithm(RRA)to infer the multi-layer regulatory network based on the integration of multi-group data.In the process of inferring a multi-layer regulatory network,the dimension problem of data leads to false positive and false negative of regulatory relationship in the network,and the indirect regulation between variables is the main cause of false positive.To reduce false positive and false negative of multi-layer regulatory network,a framework of multilayer regulatory network is proposed to optimize the network.In order to reduce the false negative in the regulatory relationship,we propose a dynamic threshold strategy to judge the effectiveness of the regulation relationship among variables in the framework of building a multi-layer regulation network.Set the value below the quartile of the relationship between one variable and all other variables to zero,and in the next step of the recursive regularization algorithm,only the relationship between non-zero variables is re-estimated.Using CMI2(conditional mutual inclusive information)to filter the relationship between variables,it can reduce the false positive generated by indirect regulation.Two sets of simulation data and two types of real biological data are used to evaluate the performance of recursive regularization algorithm,and compare it with Narromi,HalfThr,CMI2 NI algorithms.The experimental results show that RRA algorithm can effectively integrate the biological multi-group data to reconstruct the multi-layer regulatory network.
Keywords/Search Tags:Genomics data, Reconstruction, Regulatory network, Data integration, Multi-layer networks
PDF Full Text Request
Related items