Font Size: a A A

Identification Of Topological Structures And Modeling Of Epigenetic Regulation In The 3D Genome

Posted on:2020-01-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y S YeFull Text:PDF
GTID:1360330602463866Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The highly complex folding of mammalian chromosomes into the nucleus has increasingly been recognized as an important factor in gene regulation,intracellular biological processes,cell differentiation,and evolution.Chromatin conformation capture(3C)technology and its variants gradually reveal multi-scale topological domains in the three-dimensional(3D)genome architecture.From MB-scale activated or inactive compartments,to topologically associated domains(TADs)or sub-TADs,further to more elaborate chromatin loops.These structures determine the location and topological status of genes and regulatory elements,which in turn affects the regulatory program of the genome.With the development of multiomics single cell technologies,it will provide an unprecedented opportunity to resolve highresolution intracellular biological processes and cell type or tissue-specific epigenetic regulatory of the genome.The regulatory program of the genome is driven by the combination of transcription factors(TFs)binding to regulatory control regions to regulate the transcription of the target gene and determine the occurrence of intracellular biological processes.This dissertation investigates multi-scale topological domains and epigenetic regulation programs in the nucleus.This paper proposes a generic and efficient method to identify multi-scale topological domains(MSTD)from both asymmetric and symmetric 3D genomic datasets,and then proposed a powerful and robust circular trajectory reconstruction method CIRCLET for cell-cycle phases of single cells considering multi-scale features of chromosomal architectures,and finally detect the TF combination interactions that drive epigenetic regulatory of the genome,The main research work of this paper is as follows:1.Existing methods for domain detection were only designed based on symmetric Hi-C maps,ignoring long-range interaction structures between domains.To this end,we proposed a generic and efficient method to identify multi-scale topological domains from a variety of 3D genomic datasets.We first applied MSTD to detect promoter-anchored interaction domains(PADs)from promoter capture Hi-C datasets across 17 primary blood cell types.The boundaries of PADs are significantly enriched with one or the combination of multiple epigenetic factors.Moreover,PADs between functionally similar cell types are significantly conserved in terms of domain regions and expression states.Cell type-specific PADs involve in distinct cell type-specific activities and regulatory events by dynamic interactions within them.We also employed MSTD to define multi-scale domains from typical symmetric HiC datasets and illustrated its distinct superiority to the-state-of-art methods in terms of accuracy,flexibility and efficiency.2.Single-cell Hi-C technology is emerging and will provide unprecedented opportunities to elucidate chromosomal dynamics with high resolution.How to characterize pseudo timeseries of single cells using single-cell Hi-C maps is an essential and challenging topic.To this end,we develop a powerful circular trajectory reconstruction tool CIRCLET to resolve cell cycle phases of single cells by considering multi-scale features of chromosomal architectures without specifying a starting cell.CIRCLET reveals its best superiority based on the combination of one feature set about global information and another two feature sets about local interactional information in terms of designed evaluation indexes and verification strategies from a collection of cell-cycle Hi-C maps of 1171 single cells.Further division of the reconstructed trajectory into 12 stages helps to accurately characterize the dynamics of chromosomal structures and explain the special regulatory events along cell-cycle progression.Last but not the least,the reconstructed trajectory helps to uncover important regulatory genes related with dynamic sub-structures,providing a novel framework for discovering regulatory regions even cancer markers at single-cell resolution.3.Current studies about TF combinatorial regulation is deficient due to lack of experimental data in the same cellular environment and extensive existence of data noise.Here,we adopt a Bayesian CANDECOMP/PARAFAC(CP)factorization approach(BCPF)to integrate multiple datasets in a network paradigm for determining precise TF interaction landscapes.In our first application,we apply BCPF to integrate three networks built based on diverse datasets of multiple cell lines from ENCODE respectively to predict a global and precise TF interaction network.This network gives 38 novel TF interactions with distinct biological functions.In our second application,we apply BCPF to 7 types of cell type TF regulatory networks and predict 7 cell lineage TF interaction networks,respectively.By further exploring the dynamics and modularity of them,we find cell lineage-specific hub TFs participate in cell type or lineage-specific regulation by interacting with non-specific TFs.Furthermore,we illustrate the biological function of hub TFs by taking those of cancer lineage and blood lineage as examples.Taken together,our integrative analysis can reveal more precise and extensive description about human TF combinatorial interactions.
Keywords/Search Tags:Multi-scale topological domains, single cells, cell cycle, epigenetic regulation, TF interactions
PDF Full Text Request
Related items