Font Size: a A A

A Research And Application On The Simulated Space Environment Health Monitoring Using Multi-omics Data

Posted on:2017-04-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:X LiFull Text:PDF
GTID:1224330503969917Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
During the Simulated interplanetary human spaceflight(SIHS) experients, many kinds of human stress phenotypes changed(e.g., endocrine dysfunction, phenotypic stress disorder, insomnia, anxiety, etc) due to the extreme environmental simulations, where multi-omics molecules(genomic, transgenomic and epigenetics) play an important role during these processes. However, with the development of high-throughput technologies, a large number of multi-omic data growing rapidly. How to effectively investigate the association between billions of multi-omic molecules and pathological & physiological phenotypes and applied them into the health monitoring and evaluation system provide insights into extreme environmental healthy prediction and the discovery of novel space medicine knewledges. In this thesis, we focus on the prediction of human phenotopic changes in simulated space environment through novel algorithms development, data mining, physiological and pathological outcome modeling and biomarker discovery. These applications have been adopted into space medical health prediction, cancer prognosis and classification modeling, which provides insights into more incorporated practice of personalized medical references. Main findings of this sutdy are as follows:In this thesis, we firstly proposed six normalization algorithms according to the health prevention requirements in SIHS, including the RNAseq normalization, the calculation of m RNA/lnc RNA expression RPKM/TPM value; grouping beta values of methylation probes of each sample into a matrix format from the DNAmethy Ch IP datasets and grouping micro RNA-quantification data of each sample into a TPM/RPKM format matrix from the micro RNASeq profiles; the identification of somatic/germline mutations from whole genome sequencing and whole exome sequencing data. Besides, we proposed the machine learning library which includes seven machine learning algorithms of SVM, Naive Bayes, KNN, Random-Forest, Decision Trees, Neural Network, Linear regression, and four feature selection approaches of PFS, RF-IS, Lasso and SVM-RFE. We also investiaged a “circulation algorithm” based approach for optimization feature selection. Moreover, we investigated personalized analysis programs of survival analysis and literature retrieval for analyzing biomarkers with the best predict ive performance extracted from machine learning models. The survival analysis functions include the uni-variate and multi-variate cox regression analysis and logrank test. Literature retrieval function include two databases of Pub Med and CNKI which in both of Chinese and English retrieval methods based on perl program functions. Through inserting keywords to database, the layout results return the matching documents. This program is a kind of commonly used data mining strategy for identify the relationships between diseases and omics biomarkers. Finally, we investigated an R based software package: Clinical & Artificial Priorization Modeling Package(CAPM) for the implementation of the above algorithms and some tests were made in the package have accieved fairly good results. The CAPM packages as a novel software tool provide both the theoretical basis and decision support for human health assessment and prevention.Secondly, we proposed a longitudinal analysis on the “Mars500 long term isolation experiment” datasets of DNA methylation Bead Chip and blood biochemical data, and built predictive models of long term blood glucose changes. Our analysis revealed the dynamic nature of epigenetic patterns associated with isolation induced blood glucose dysregulation. Besides, we utilized CAPM package to built the glucose model using glucose related DNA methylation probes. Through feature extraction approaches, we identified 151 epigenetic factors that could well predicted the glucose changes in M500 crews, and 18.7% epigenetic signatures were found differentally methylased in T2 D specific populations. The results suggest that our epigenetic based environment-controlled models provide novel insights into understanding early dynamic etiology of complex diseases(such as t ype 2 diabetes). We established a specific temporal association between glucose changes and DNA methylation remodeling, which not only contribute to identify glucose associated epigenetic factors, but also provide insights into long term biochemical predic tion.Thirdly, micro RNA based physiology & psychological predictive models in sleep deviation experiment. This study is based on sleep deprivation experiment which measured the micro RNA expression, blood biochemical level and psychological contract inventory. We utilized dimensionality reduction and data mining approaches to identify micro RNA signatures that associate with biochemical & psychological changes, and built 12 physiology & psychological predictive models using CAPM approaches. This work as a prospective research have explored the utilization of blood micro RNAs as predictive signatures that well predicted physiology & psychological changes during sleep deprivation experiment. The models identifed micro RNA signatures significantly enriched in psych ological related tissues and pathways(e.g., brain, platelet, and long term depression). These results provided novel insights into identify physiology & psychological associated micro RNA signatures using machine learning models.lastly, The investigation of CAPM method on cancer prognosis&calssification modeling using multi-omic profiles. To explore the utilization of CAPM on complex diseases. We further investigated a CAPM based prognostic modeling pipeline(IDFO) which composed of 6 machine learning approach and 4 feature selection algorithms to predict patient survival outcome by identifying prognosis-related biomarkers using multi-type molecular data(m RNA, micro RNA, DNA methylation, and lnc RNA) from 3197 samples of five cancer types. We assessed the pr edictive performance of both single molecular data(20 groups) and integrated multi-type molecular data(20 groups) in patient survival stratification, and compared their relative importance in each type of cancer, respectively. Survival analysis using multivariate Cox regression was performed to investigate the impact of the IDFO-identified markers and traditional variables on clinical outcome. Furthermore, we built a multiple primary lung cancer classification model which identified novel somatic mutations of EGFR-L858 R and MYCL. Our study provides insight into systematically understanding the prognostic performance of diverse molecular data in both single and aggregate patterns, which may have specific reference to the development of drug targets and subsequent clinical researches.
Keywords/Search Tags:Simulated interplanetary environment, High throughput omics data, Machine learning, Health prediction, Software development
PDF Full Text Request
Related items