Font Size: a A A

3D Genome Structure Recognition Integrated System Construction And Algorithm Development

Posted on:2022-07-16Degree:MasterType:Thesis
Country:ChinaCandidate:R DuanFull Text:PDF
GTID:2480306335956629Subject:Chemistry
Abstract/Summary:PDF Full Text Request
In the nucleus of mammals,DNA exists in a double helix structure.Taking humans as an example,the DNA sequence of humans contains about 3 billion pairs of deoxyribonucleotides,which can be expanded up to two meters in length.Therefore,in the nucleus,DNA must be arranged in a highly ordered spatial structure to be loaded into the nucleus.Scientists have discovered that the arrangement of DNA is first through a doublestranded structure that is wound around histones to form nucleosomes.The nucleosomes are assembled to form chromatin fibers,and the chromatin fibers are further ordered to form a three-dimensional spatial structure.Based on the exploration of structural research,chromosome conformation capture(3C)was invented,and then the technology and its derivative technologies matured,and a multi-level three-dimensional chromatin topology was revealed.MB(megabyte)length of chromatin A/B compartments(A/B compartments),which regulate the activation or inhibition of genes in the region.KB(kilobyte)length topologically associated domains(TADs)and sub-topological associated domains(subTADs)nested in TAD.Then to the finer chromatin loops.These structures determine the structure of genes in the cell nucleus and the scope of regulatory elements,thereby affecting transcriptional regulation,and further determining the occurrence of biological processes in the cell.In this study,we study the development of the recognition algorithm and multi-method integration platform in the nucleus.Based on high-throughput chromosome conformation capture(Hi-C)chromatin three-dimensional structure data,an efficient topological correlation domain recognition algorithm was developed to quickly identify topological correlation domains.At the same time,in view of the problems existing in the current topological correlation domain identification methods,a multi-method integration platform was constructed to further improve the efficiency and quality of topological correlation domain identification.The main research work of this paper is as follows:1.In order to quickly and accurately identify high-quality topological association domains,a topological association domain recognition algorithm MCluster-Hic based on unsupervised machine learning algorithm is proposed.The Hi-C interaction matrix is transformed into two different weighted networks,and the network is clustered by Markov clustering algorithm.Finally,through a mapping algorithm,the clusters are mapped to topologically related domains.The TADs boundaries recognized by MCluster-Hic have strong biosignal enrichment and show conservation in different cells.At the same time,through comparison with other methods,the accuracy and efficiency of the MCluster-Hic algorithm is further demonstrated.2.Based on the descriptive definition of topological relational domains,"the chromatin spatial interaction between regions within topological relational domains is significantly stronger than the interaction between regions of two topological relational domains".As a result,there is no "gold standard" for measuring the quality of topology associated domain recognition.Therefore,so far,many algorithms have been developed to identify topology associated domains to detect this structure.However,the single method is usually from a certain point of view,to consider the detection of topology related domain,without a comprehensive consideration.At the same time,the operation platform,data input and operation difficulty of each method are different,which leads to the high operation difficulty of multiple methods.Finally,recent studies have also shown that topologically associated domain boundary regions identified by multiple methods have stronger signal enrichment on some genomic features than those identified by a single method.Due to the above problems,we designed a multi-modal result integration algorithm and established a corresponding algorithm integration platform.By applying the algorithm designed by us to the established integration platform,we can quickly obtain multiple method identification results simultaneously.At the same time,a set of reliable and unified topological correlation domain identification results based on the multi-method results can be calculated.
Keywords/Search Tags:Machine learning, topological association domain, algorithm integration
PDF Full Text Request
Related items