Font Size: a A A

DNA Binding Sites Identification Of Transcription Factors Based On DNase High Throughput Sequencing

Posted on:2020-12-26Degree:MasterType:Thesis
Country:ChinaCandidate:R D CongFull Text:PDF
GTID:2370330575973384Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Protein is an important product of gene expression and plays an important role in life activities.Among many kinds of proteins,transcription factor proteins can bind to DNA specifically and control gene expression on DNA.They are very important proteins.The main of this study is to identify the binding sites of transcription factor proteins accurately.At present,ChIP-Seq and DNase-Seq are commonly used to detect transcription factor protein binding sites.Although ChIP-Seq technology is relatively mature,it has many problems,such as high sequencing cost,difficult matching of specific enzymes,and long time-consuming.The new DNase-Seq technology can effectively avoid the above problems,and can measure a large range of gene regions at one time,and its detection accuracy can reach a single base.This topic downloads DNase-Seq data from ENCODE website,and designs a RNN neural network correction model to correct DNase digestion base propensity.ChIP-Seq data is download from ENCODE website,and the binding sites of interesting transcription factors on DNA were accurately obtained by GEM and FIMO software as samples.The decision threshold of the transcription factor PWM matrix was obtained.According to this threshold,candidate binding sites of transcription factors not found by ChIP-Seq were taken as negative samples.DNase shear values were extracted from all samples and base orientation was corrected to form a DNase data set of interesting transcription factor binding sites.Based on this data set,a DNase recognition model of transcription factor binding sites based on CNN convolution neural network was designed and implemented.Subsequent experimental results confirm the validity of the design method.
Keywords/Search Tags:Transcription factor binding sites, DNase-Seq, Recurrent neural network, Convolution neural network
PDF Full Text Request
Related items