Font Size: a A A

Data Mining And Dynamic Modeling For Disease Of Biology

Posted on:2015-06-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:1224330476453962Subject:Biology
Abstract/Summary:PDF Full Text Request
In post-genome era, dealing with biology data from all of aspects has become an important work of bioimformatics. The ability to successfully distinguish between disease classes and selecting informative features from big data are of great importance to disease classification, and understanding how gene expression is regulated under different conditions by methods of system biology is vital in molecular biology and disease diagnosis. However, existing learning algorithms fail to take characteristic of different diseases into consideration, and thus got limited in their ability to reflect all key features from data minging of associated high throughput data. Especially, the challenges in modeling gene regulatory network, such as the small sample size problem, the complex dynamics and nonlinear relationship among the genes, restrict our understanding of how gene expression is regulated under different conditions. More features are related to a disease compared with biological experimental data size, is one major cause of these problems.This paper thus focuses on molecular characteristics of several diseases,aiming to elucidate disease associated quantitative kinetics mechanism. Several algorithms have been designed by this paper specifically. The paper includes three parts:1, An algorithm named “Feature Merging and Selection algorithm(FMS)” has been designed to deal with 16 S rRNA expression datasets of pneumonia and dental patients aiming to classify microbiota associated diseases. The algorithm fully compresses feature space, but conserves enough original features at the same time.What’s more, it is more intelligibility of results because of no overlapping between different transformed combinations learning by the algorithm. Validate by meta-genome data of two different diseases, discrimination rate learned by the algorithm is higher than other control methods, and dimension is lower, which make the model more stable.2, For biotic experimental results of high expression level of gene Maff and Egr3 in mouse normal hematopoietic stem cells in leukemic environment, and such two genes affecting cell cycle with opposite way, this paper filters quantitative models of Maff and Egr3 regulating cell cycle by “exhaustive method and model selection ” in help of bioinformatics network resources. After validating the model by simulating a series of expression level of key molecules of cell cycle and scanning sequence binding sites, in help of kinetics simulation, this paper calculates and achieves the results that Egr3 can inhibit cell cycle strongly, and ability of Maff promoting cell cycle is limited by Egr3, which confirms the hypothesis of “ and self-protected ” mechanism.3, A parameter cancerization estimation method, named “Small Sample Iterative Optimization algorithm(SSIO)”, has been designed to quantitatively describe nonlinear gene regulatory relationships from gene expression data with small sample size. The algorithm can estimate the parameter in condition of small sample both correctly and accurately, and construct the quantitative dynamic model of gene regulatory network controlling adipogenesis. The algorithm is validated by regulatory network of both human and mouse. In addition, after finding differential expression genes before and after differentiation, the algorithm compares and calculates a series of additional feedbacks, which are validated. Then by comparing parameters, kinetics results and strength of regulation in statistics, several differences in details were observed between human and mouse quantified adipocyte networks, suggesting the differences in regulation efficiency of the transcription factors between the two species.
Keywords/Search Tags:dimension reduction, model selection, small sample problem, kinetics simulation, quantitative regulatory network optimization
PDF Full Text Request
Related items