Font Size: a A A

Detection And Application Of Chromatin Contact Domain Boundary Based On Hi-C Data

Posted on:2021-03-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y HuangFull Text:PDF
GTID:2370330626461655Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
As the basic structure and functional unit of chromatin,the chromatin contact domain is composed of co-regulated gene clusters of different sizes,which is closely related to gene regulation and directional differentiation of cells,and has certain conservation in different species.With the rapid development of chromosomal conformation capture and its derivative technologies,especially the emergence of high-throughput Hi-C technology,data for three-dimensional interactions of chromatin are becoming increasingly abundant,providing material conditions for the positioning and detection of chromatin contact domains and their boundaries.Relevant fields have become important topics in epigenetic research.However,the current chromatin contact domain and its boundary detection tools and algorithms are still very limited.A series of problems such as poor repeatability,high running time cost,and low detection accuracy are common.Therefore,new detection methods are proposed based on existing algorithms.Is the key to make up for these shortcomings.This paper systematically compares and analyzes two types of representative chromatin interaction domain detection methods,selects the most widely used one-dimensional statistical method,and proposes insulation-based methods based on the existing HiCDB and TopDom algorithms.Hi-C insulation density detection algorithm(Hi-C Insulation Density,HiCID)is used to characterize the intensity change of the boundary of the contact domain.In addition,in order to improve the signal-to-noise ratio of the original Hi-C data,this paper embeds network enhancement technology into the data preprocessing process,and determines the threshold of the domain boundary based on the enrichment of insulator-binding protein(CTCF)and histone modifications.The characteristics of Hi-C data with different resolutions optimize the size and number of sliding windows,and provide favorable conditions for further using statistical knowledge to classify domains,domain boundaries,and non-interacting chromosome gaps.Finally,at the contact domain and its domain boundaries,the genetic characteristics of histone chemical modifications,RNA polymerase II and adhesive protein subunits and other generegulation-related components were analyzed to obtain gene regulation law.Compared with other algorithms based on one-dimensional statistics,the HiCID algorithm proposed in this paper has improved significantly in terms of consistency,accuracy,and robustness,especially in the accuracy of chromatin scope and its boundary location.The insulation density statistics defined in this paper re-characterized the chromatin interaction frequency distribution from the perspective of the density change of the Hi-C contact matrix,and improved the original Hi-C data quality through network enhancement technology.In addition,the mediator protein CTCF and histone modification information are introduced to jointly determine the domain boundary cutoff threshold,which improves the conservation of the identified domain boundaries and makes the experimental results more biologically meaningful.In short,the HiCID algorithm has a low rate of missed detection of candidate boundaries in practical applications,showing that as the resolution of Hi-C data is higher,the algorithm is more stable,and it has better portability and redundancy.Therefore,this algorithm can be widely used to effectively detect and identify chromatin contact domains and their domain boundaries in different cells.
Keywords/Search Tags:chromatin contact domain, Hi-C technology, HiCID algorithm, gene characteristic analysis
PDF Full Text Request
Related items