Font Size: a A A

Study On Prediction Of DNA-protein Binding Sites Based On Deep Neural Network

Posted on:2024-05-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y H YinFull Text:PDF
GTID:2530307157951239Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Proteins that can bind to specific nucleotide sequences in the upstream of a gene are called transcription factors.Transcription factor binding site refers to DNA fragment that binds to specific transcription factors.It is called motif,which is often located in the upstream of a gene.Accurate prediction of DNA-protein binding sites(DPBS)has important biological significance for studying the regulatory mechanism of gene expression.In recent years,with the rapid development of biological information technology,advanced deep neural networks have been introduced into this field,significantly improving the prediction performance of DNA-protein binding sites.However,these methods are primarily based on the DNA sequences measured by the Ch IP-seq technology,failing to consider the possible partial variations of the motif sequences and errors of the sequencing technology itself.Moreover,most prediction methods only consider the sequence information of DNA and ignore its shape features.At the same time,in the model design stage,most methods only consider using fixed motif lengths to capture binding features in DNA sequences,while the length of binding sites is not fixed.Such feature extraction methods are obviously insufficient.In response to the above problems,this thesis has designed two solutions for predicting DNA-protein binding sites based on feature representation and network structure design.The main work is as follows:(1)In this thesis,we consider both DNA sequence information and its shape features,and design a deep neural network called Shape-Dense Net for predicting DNA-protein binding sites based on dense convolutional network.Hybrid coding can provide more features for neural network training.Experiments have shown that by combining DNA sequence information with its shape features,the prediction performance of DNA-protein binding sites has been improved.(2)In this thesis,a fault-tolerant coding mechanism is proposed for converting DNA sequences into inputs of the neural network.This mechanism takes into account possible partial variations in the motif sequences and errors of the sequencing technology itself,enriching the input characteristics of the neural network.At the same time,we propose a Multi-Scale Dense Convolutional Network-based approach,termed MSDense Net,for the characteristic of unfixed length of transcription factor binding sites.Experiments have shown that combining fault-tolerant coding with multi-scale dense convolutional network significantly improves the prediction performance of DNA-protein binding sites.(3)In this thesis,based on the Spring Boot framework,a prediction system for DNA-protein binding sites has been developed,which facilitates scientific researchers in related fields to easily and efficiently confirm whether the predicted DNA sequence contains transcription factor binding sites.
Keywords/Search Tags:DNA-protein binding site, DNA shape features, Fault-tolerant coding, Dense convolutional network, Multi-scale convolution
PDF Full Text Request
Related items