| Gene Regulatory Network (GRN) is a type of biological network used to expresscomplex regulatory relationships among genes, and one of the most important means tounderstand the gene function. Constructing GRN helps to reveal the mechanism of cells’activity and to promote understanding of the interactions between genes. It has becomea hot research topic in bioinformatics and systems biology. In recent years, the GRN hasbeen applied to the prediction of disease genes, drug target screening and other areas.And its influence is far-reaching on the early diagnosis of diseases, individualizedtreatment and drug development. Accordingly, the GRN modeling and its applicationsare of important theory value and practical significance.The research content of the dissertation focuses on the GRN and several GRNmodeling algorithms are presented to adapt to different types of GRNs. Moreover,differentially expressed network modeling is applied to the prediction of oncogenesassociated with breast cancer, and six breast cancer related genes are identified. In thedissertation, the main innovation, novelty and contribution are as follows:1. A new algorithm that combining constraint with search is proposed to constructcausal regulatory network. To keep a balance between computational complexity andaccuracy of reconstructed network with the scale of the network rising rapidly, theconstraint test is added to score and search process. In this process, the conditionalindependence test is used to compress the search space. Though it takes times ofcalculation, the accuracy of the inferred network is improved. After that, the BDeu isintroduced to measure score of the network topology. Based on the score, the heuristicsearch is employed to ensure the accuracy and accelerate the search process. Theexperimental results show that the method improves the search efficiency and thequality of solution on the simulated and real data compared with the classic constraintand search based algorithms (TPDA and PC). The results also show that the accuracy ofsearch based method is much higher but time-consuming, by contrast, the constraintbased method is time-saving but with lower accuracy. Compared with the twotraditional algorithms, the proposed algorithm takes the advantages of both of them, thatis, improving accuracy but with little addition of computation, and constructs causalgene regulatory networks effectively.2. A new algorithm based on the statistical test of regression error is presented toconstruct the co-expressed gene regulatory network. Aiming at the low speed of thelarge-scale network modeling, the fast algorithm is proposed to construct the large-scale co-expressed gene regulatory network (the number of genes is up to thousands or evento tens of thousands). As traditional algorithms are of lower calculation efficiency inconstructing the large-scale network, the effective acceleration on modeling becomesvery important. The presented algorithm identifies significant regulatory relationshipthrough test of regression error. To overcome strong noise related to the gene expressiondata, the method employs autoregressive update to eliminate noise effect and improveaccuracy. Finally, the approach realizes the convergence of the iteration by using AICand achieves rapid construction of GRN. Experimental results show that the proposedalgorithm effectively improves the constructing speed, and the accuracy of the proposedmethod is superior to that of other two traditional methods, glasso and BoostiGraph. Itcan effectively construct co-expressed genetic network, especially when the noise ofgene data is strong.3. A new algorithm based on the scale-free feature is put forward to constructcomplex gene regulatory networks. The GRN is a typical complex network. As one ofthe most important characters of complex network, the scale-free feature can provide apriori knowledge of the network structure. To utilize and reflect scale-free feature ofcomplex regulatory network, Bayesian inference is performed by employing the priorknowledges of the scale-free network and the computational efficiency is improvedthrough Boosting algorithm. Experimental results on simulated and real data show thatthe algorithm improves the accuracy and robustness of the GRN modeling whencompared to NIMOO, Lasso, and NIR modeling algorithm. To a certain extent, thealgorithm meets the demands of modeling complex networks and of describing thescale-free feature. The experimental results also indicate that the appropriate prioriknowledge can not only reconstruct the GRN in accordance with the real biologicalnetwork, but also improve the accuracy of modeling.4. A new algorithm based on the hidden variable sampling is proposed to constructdynamic GRNs. Modeling the dynamic GRNs through time series data is an importantproblem in systems biology and helps to promote understanding of complex systems.Aiming at the problem of large amount of calculations and low accuracy in buildingdynamic GRNs, the presented algorithm, lvMCMC, introduces multiple latent variables,such as the number and location of the break points. Then, based on the dynamicBayesian network model, Markov chain Monte Carlo sampling method is employed toinfer state of hidden variable. In the process of the time-interval segmentation, thenetwork structures are searched. Finally, the topology structure of time-varying network is generated. The experimental results on the simulated and real data show that thelvMCMC is more efficient than the information theoretic methods, and can reconstructdynamic GRN effectively.5. A new prediction method based on the differential network modeling is proposedto identify cancer related genes. In order to reveal the molecular mechanisms of canceroccurrence, development, and to promote gene discovery for anti-cancer drug target,prediction of cancer related genes has become the hotspot of current bioinformaticsresearch. In this study, aiming at the large number of results and the high uncertainty incancer related gene prediction, the method discovers differentially-expressed genesbased on microarray data firstly. After that, the scheme constructs differential expressionnetwork so as to screen out irrelevant genes and increase prediction accuracy. And thenthe candidate genes are identified in differential network, as well as enrichment analysisis performed based on the GO information and the KEGG pathway to examinesignificance of the detected genes. Ultimately, cancer related genes are obtained.Experiments on the breast cancer gene expression data found six breast cancerassociated genes. Gene set enrichment analysis and overlap analysis indicate that thepredicted breast cancer related genes possess distinct biological significances thatfurther clarify the feasibility and effectiveness of the proposed method. |