Font Size: a A A

Molecular Subtyping And Genome Annotation Based On Integrative Multi-omics Data Analysis

Posted on:2022-10-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:H T LiFull Text:PDF
GTID:1520307058996499Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of high-throughput sequencing technology,the speed of sequencing is increasing while the cost is decreasing.The technology is increasingly applied to solve biomedical problems,especially widely used in the study of complex diseases.The big data-driven and multi-modality data fusion model has greatly improved the efficiency of human disease research and provided new and more effective methods for the diagnosis,prevention,and treatment of complex diseases.It also provides a novel and efficient way to understand the mechanism of biological systems and reveal the causes of disease.Therefore,the study of multi-omics data fusion is an important research direction in the current development of bioinformatics.This thesis systematically investigates the multi-omics data integration method.The method deals with complex disease-state changes as well as cell-state transformations.We developed bioinformatics analysis algorithms and achieved accurate molecular subtyping of disease,as well as predicting the conversion of complex disease.We explored genomic variants that affect disease,occur during cell-state transformation,and integrated multi-omics data for functional annotation of genomic variants.Based on the integration of multi-omics data in biomedicine for artificial intelligence modeling,we systematically model and study the genome,transcriptome,epigenome,and proteome from various levels of genomics to gain a comprehensive insight into the molecular mechanisms involved in the development of complex diseases and cell-state transformation.Works contained in this thesis are listed as follows:1.We constructed a molecular subtyping method of mild cognitive impairment(MCI)based on multi-omics integration analysis.MCI was a neurological disorder.MCI was known as the prodromal stage of Alzheimer’s disease(AD).In this research,we integrated two types of omics data,genetic polymorphism and gene expression profiling from MCI patients’peripheral blood samples from the ADNI-1 dataset,to identify subtypes with biological and clinical significance by the similarity network fusion(SNF)method.MCI patients from the ADNI-2 dataset were applied to test the effectiveness and reliability of this method.We used Kaplan-Meier analysis and log-rank testing for the conversion from MCI to AD between two subtypes,and the p-value is 4.58×10-3.In addition,we compared patients among two MCI subtypes by the following factors:the changes in Alzheimer’s disease cognitive scales and MRI image;significantly enriched pathways based on differentially expressed genes.This study proved that MCI is a heterogeneous disease by concluding that AD development in two MCI subtypes is significantly different.2.We proposed a prediction model based on the integration of structural magnetic resonance imaging(s MRI)data and multi-omics data.Using the findings from the study(1)on molecular typing of MCI patients,a subtyping-based prediction model was constructed for predicting the conversion from MCI to AD in three years for each subtype separately.In this study,a variational Bayes approximation model based on the multiple kernels learning method was constructed as a classification prediction model using s MRI,SNP,and gene expression profiling from MCI patients in the ADNI dataset.The research found that our subtyping-based prediction model achieved an overall AUC of 0.83 compared to the model without subtyping,which achieved an AUC of 0.78.By comparing similar prediction results,our method achieved better performance than the method of MRI-based methods.It is proved that identifying MCI patient subtypes with omics data would improve the accuracy of predicting the conversion from MCI to AD.3.We developed a strategy for functional annotation of AD-associated SNPs based on chromatin 3D structure information.In this research,we leverage the three-dimensional chromatin structure information,Hi-C and promoter capture Hi-C data,in two brain tissues(hippocampus and brain cortex)for functional annotation of AD-associated SNPs.Utilizing information on the expressed quantitative trait locus(e QTL)and functional enrichment analysis,it is indicated that there is a close relationship between the list of putative target genes regulated by AD-association SNPs and the molecular mechanisms of AD.To prove the validity of our strategy,we focus attention on three SNP-target gene pairs(rs2373115-NARS2,rs6656401-CR1,and rs3776011-ACSL6)were performed functional annotation based on the perspective of 3D chromatin structure,and new insights into the pathogenic mechanisms of AD risk loci were gained.Identifying the target genes based on 3D chromatin structure at genome wide association study risk loci will lead to a greater understanding of the mechanisms that influence AD risk and prognosis.4.We established a strategy for functional annotation of genomic variant loci during stem cell differentiation using the whole gene sequencing data generated in our lab.By integrating the epigenomic and transcriptomic data,the genomic variations were determined with the whole-genome sequencing for three pairs of pluripotent stem cell lines and their corresponding BMP4-induced trophoblast cell lines.Introns enrich more variations.We found~45%of the differentially expressed genes in trophoblasts that associate genomic variations.We inferred that during the differentiation,an increase in the expression level of the MEF2C gene is due to a genomic variation in chromosomes 5:88179358 A>G,which is at a binding site of TFs to MEF2C.Allele G shows a higher affinity to the TFs in the induced cells.The increased expression of MEF2C leads to an increased expression of TF MEF2C’s target genes,subsequently affecting the differentiation.This thesis investigates the development of complex diseases and cell-state transformation based on multi-omics data,especially for subtyping and early diagnosis of MCI patients.The proposed new method is non-invasive and cost-effective,and can be used for identification of molecular subtyping of MCI for clinical application.
Keywords/Search Tags:Multi-omics integration analysis, Omics big data, Alzheimer’s disease, Molecular subtyping, Functional annotation, Machine learning
PDF Full Text Request
Related items