Font Size: a A A

The Identification Of DNA Motif Pairs On Paired Sequence In Eukaryotes

Posted on:2020-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:F XingFull Text:PDF
GTID:2370330572477689Subject:Operational Research and Cybernetics
Abstract/Summary:PDF Full Text Request
Bioinformatics is a new interdisciplinary field integrating biology,applied mathematics and computer science and informatics.In the last decade,with the development of sequencing technology and the implementation of the human genome project,a large number of biological genome sequences and protein sequences data have been generated,which provided us with abundant data for further research and understanding of bioinformatics.At the same time,we are faced with many challenges while analvzing these massive data.One of the fundamental problems in bioinformatics is DNA motif discovery problem,which is also a challenging question.The main topic we concerned in this thesis is how to use the bipartite graph model to discovery cis-regulatory DNA motif pairs in eukaryotes.Gene expression refers to the process of protein synthesis under the guidance of gene,in which the key step is gene transcription,and it is also the key stage of regulatory gene expression.In higher eukaryotes,the interactions between cis-regulatory elements(or DNA motifs),such as enhancer and promoter,play an important role in regulating the temporal-spatial expression of related genes.The accurate understanding and prediction of the interactions between cis-regulatory elements in long-range chromatin-interacting sequence pairs(e.g.promoter-enhancer pairs)will help us to further study the properties of transcription factor,and also it can be used in the field of disease research and pharmaceuticals.In this paper,firstly we introduce the background and significance of the research of DNA motif pair discovery on long-range chromatin-interacting sequence pairs,then briefly introduce and analyze the existing cis-regulatory DNA motifs discovery algorithms,and elaborate on two kinds of DNA motif pair discovery algorithms,which were proposed by Ka-Chun Wong et al.The two algorithms are de novo DNA motif pair discovery algorithm and MotifHyades algorithm.With the study of CHIP-seq data and the application of Hi-C technology,more evidence suggest that the coupling DNA motif pairs enriched on long-range chromatin-interacting sequencepairs are related to gene co-expression and protein-DNA interactions.Databases such as GEO,ENCODE provide us with a large number of long-range chromatin-interacti-ng sequence pairs data(e.g.promoter-enhancer pairs).Combining these data,we have designed a new algorithm combining the bipartite graph model with DNA motif pair discovery problem.This algorithm mainly contains two processes:construct a motif pair partite graph and find dense sub-graphs.We provide a program running on the Windows platform,based on the long-range regulatory region pairs in human K562 cells,it can predict the motif pairs on these long-range regulatory region pairs.Through the analysis and comparison of the result data,we find that the new algorithm can quickly predict the motif pairs on the long-range regulatory region pairs,and it has a high accuracy.The innovation of the algorithm is that it uses high-order Markov model to represent the dependence relationship between nucleotides,skillfully creates the l-mer bipartite graph,and uses the improved DBSCAN clustering algorithm to solve the motif pair discovery problem based on long-range chromatin-interacting sequence pairs.Finally,I establish an effective model for motif pair identification.
Keywords/Search Tags:Bioinformatics, Transcription factor binding sites, Chromatin interaction, Motif discovery, Motif pair, Promoter, Enhancer, High-order Markov model, DBSCAN clustering algorithm
PDF Full Text Request
Related items