Font Size: a A A

Research On Inferring Gene Regulatory Network Structure Based On Information Theory

Posted on:2018-01-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:W LiuFull Text:PDF
GTID:1310330542974486Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The completion of the human genome project marks that the modern life sciences research has entered the era of systemic biology.System biology is not just an emerging field,but more importantly,it represents a new approach to biology research.Individuals gradually realize that they cannot limit themselves to the study of a single gene in the course of the research,but should comprehensively explore the regulation among these genes from the system point of view,study the operating mechanism of the whole life system,and finally decipher the secret of life inheritance.With the rapid development of high throughput technology,a large number of research results have produced vast amounts of gene expression data.It is one of the most challenging biological problems in the post-genomic era to mine biological regulatory relationships and regulation mechanisms from these data.The inferring of gene regulatory network structure aims to construct the network structure,which consists of the regulatory relationships among genes,from gene expression data,Thus,the research on the inferring of gene regulatory network structures is of great significance.In this paper,we treat gene expression data as the research object and take the information theory as the background.In the view of some existing problems of the current network inference methods,we carry out the research on the algorithms of gene regulatory networks.The main work of this paper is described as follows.(1)Considering the fact that most of network inference models based on information theory infer the network topology using single network property,we propose a novel regulatory network inference method called LDCNET based on the network topology in this paper.The LDCNET algorithm is combines the mutual information of nodes in the topology theory with information theory.The algorithm first uses mutual information to initialize and preprocess the relationships between genes.Second,the degree centrality of each gene was calculated and all genes were arranged in descending order based on the degree centrality.When the different genes have the same node centrality during the sorting process,the sequences are reordered according to a strategy involved the degree centrality of the adjacent genes of the given target gene.Finally,regulatory genes for each gene in the sequence were selected,and eventually the regulatory relationships of all genes were integrated into a complete regulatory network structure.The validity of the algorithm was verified on four data sets,and the experimental results show that the proposed algorithm has good performance on inferring network structure.(2)Considering the fact that gene expression data is a typical data with high dimension and small sample,we propose a novel regulatory network inference method called maximum-relevance and maximum-significance network(MRMSn),which transforms the problem of recovering networks into a binary classification problem for each gene to select the regulator genes.To effectively solve the latter problem,we present a feature gene selection model based on mutual information and entropy reduction.The first order incremental search algorithm in the model ensures that the selected regulatory genes can approximately obtain the optimum values of the model.The weights of different features are involved in the model,so a method of automatic setting of weights based on local density is proposed in this paper.Finally,a constraint is adopted to adjust all of the regulatory relationships according to the obtained regulator genes and thus the complete network structure is obtained.The validity of the algorithm was verified on five data sets,and the results.show that the algorithm has good performance on inferring network structure.(3)Gene expression data is a typical data with high noise and nonlinear correlation.It leads to high false positive rate in the inference of gene regulatory network structure notorious.Thus,it is necessary to remove redundant regulation by redundancy reduction strategy.In this paper,we proposed a novel network inference method with controlled redundancy,and it is an extension of the MRNET algorithm.In our method,a new redundant control strategy based on information theory and clustering technology is used to reduce the redundant control relationship caused by nonlinear correlation.Then,the method presents an effective way to combine MI with conditional mutual information to assign the best-first regulator gene to each target gene,which is used to reduce redundant relationships caused by noise data.Finally,the candidate gene set of each gene and the "best-first" regulatory gene are used as input for the MRNET algorithm,and the final network structure is obtained.Our proposed method was validated on six standard datasets.The results show that the redundancy control strategy can effectively improve the accuracy of the network structure.
Keywords/Search Tags:System Biology, Information theory, Gene regulatory network structure, Network topological centrality, maximum-relevance and maximumsignificance, Controlled redundancy
PDF Full Text Request
Related items