Font Size: a A A

Study On Prediction Methods Of Long Non-coding RNA Specific Transcription Factor Binding Sites

Posted on:2018-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2310330515456969Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Long non-coding RNAs(IncRNAs)belong to a class of non-coding RNAs with transcript length greater than 200 nucleic acids.With extensive application of high-throughput sequencing technology,a large number of IncRNAs have been found in organisms,and a considerable part of them have important biological functions,such as participating in cell cycle involved in embryonic stem cell dying and cell cycle control.In research on IncRNA regulatory mechanisms,several models have been proposed,e.g.IncRNAs can recruit or drive protein binding to DNA,and function as protein complex skeletons and so on.However,the research on the mechanisms and functions of the lncRNAs is still in its infancy.Unlike protein-coding genes,most IncRNAs can not be used to produce proteins.Meanwhile,IncRNAs have several similarities to protein-coding genes and their transcripts,namely messenger RNA(mRNA).At the genetic level,the IncRNAs also have similar histone modification profiles and splicing signals to the protein-coding genes;the lncRNAs are also products of the polymerase during transcription.Based on the different characteristics and functions of lncRNAs and protein coding genes,it is presumed that the transcription mechanisms of the two types of genes could be different,thereby affecting their differential functional performance.In this paper,we investigated the transcriptional process of the lncRNAs by predicting and analyzing their transcription factor binding sites(TFBSs).First,we conducted a comprehensive review of the TFBS prediction algorithms and TFBS model databases.On the basis of this,a bioinformatic method named lncRScan-TFBS was proposed.The method was used to analyze the TFBSs associated with the lncRNAs according to the data from chromatin immunoprecipitation followed by high-throughput sequencing(ChIP-Seq).The main functions of lncRScan-TFBS include reporting basic statistical information of ChIP-Seq peaks and genes,the distance between transcirption start sites(TSS)and peaks,and conducting motif finding and enrichment analysis focusing on the TFBSs of IncRNAs and protein coding genes.The lncRScan-TFBS toolkit was applied to a ChIP-Seq dataset of mouse embryonic stem cells.Results show that the TFBSs of regulatory factors c-Myc,n-Myc,Esrrb,Klf4 and Tcfcp211 overlap with thatof CTCF,which indicates that all these factors mayparticipate in the same transcriptional regulation process.Specifically,the TFBSs of c-Myc and n-Myc almost completely overlap,and they all overlap largely with that of CTCF,which indicates that they may interact closely with CTCF through these TFs related.The experimental results did not detect lncRNA-specific transcription factor binding sites,but found that some transcription factors could regulate the lncRNAs in a longer distance than the protein coding genes,which could be the potential reason that IncRNAs have low expression levels.In summary,lncRScan-TFBS can be used to effectively analyze lncRNA TFBS,thereby helping infer lncRNA-specific transcription patterns.
Keywords/Search Tags:long non-coding RNA, transcription factor binding site, lncRScan-TFBS toolkits, transcript start site
PDF Full Text Request
Related items