Font Size: a A A

Prediction Of Protein-DNA Interaction Hotspots Based On Neural Network

Posted on:2023-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:L N CaoFull Text:PDF
GTID:2530307070484404Subject:Engineering
Abstract/Summary:PDF Full Text Request
The interaction between protein and DNA plays an important role in various life activities.Accurate and efficient identification of hot residues in the interaction between protein and DNA can not only help researchers better understand the biomolecular mechanism,but also provide a solid foundation for the construction of protein engineering and target drugs.Although many researchers have identified some hot spot residues of protein DNA interaction,most of them are still unknown.Traditional biological experimental methods have high accuracy,but the cost of time and money is high,so they can not be used to identify hot spot residues of protein DNA interaction on a large scale.However,calculation based methods are still very scarce in the field of hot residues of protein DNA interaction.In this thesis,convolution neural network algorithm and short-term memory network algorithm are used to predict the hot spot residues of protein DNA interaction.The main research contents of this thesis are as follows:1.Aiming at the current hot spot prediction field of protein DNA interaction,most of the existing methods are based on traditional machine learning.A protein DNA interaction hot spot prediction model using neural network,BPr PDH,based on multi feature fusion is proposed.In this thesis,using the collected hot spot residue data set of protein DNA interaction,the sequence characteristics,structural characteristics,network characteristics and solvent exposure characteristics of proteins are extracted.Combined with Euclidean neighborhood structure characteristics and Voronoi neighborhood structure characteristics,a 300 dimensional feature vector is constructed.Then,the constructed neural network model BPr PDH is used to identify the hot spots of protein DNA interaction.The experimental results show that the BPr PDH model performs better than the traditional machine learning algorithm.The AUC of ten fold cross validation is 0.717 and 0.741 in the independent test set.In addition,the neighborhood structure of protein can improve the prediction performance of the model..2.In view of the unclear structure of a large number of proteins and the difficulty of extracting the characteristics of structure,network and solvent exposure,a hot spot prediction model of protein DNA interaction based on sequence characteristics,deep Pr PDH,is proposed.Although BPr PDH has been proposed,there are still a large number of unclear protein structures,which can not extract the characteristics of structure,network and solvent exposure.The deep Pr PDH encodes the protein sequence according to the sequence characteristics of the protein,and integrates the sequence characteristics of unequal length into a vector matrix by filling and compression.The hot spot residues of protein DNA interaction are predicted by CNN and LSTM.The final experimental results show that compared with one hot coding and sequential coding,k-mer coding is more conducive to the prediction of hot spots.In the ten fold cross validation,the AUC value of deep Pr PDH reached 0.734.3.There are few studies on the hot spots of protein DNA interaction.PDH,a hot spot residue platform for protein DNA interaction,is constructed.The platform provides the hot residue data set used in this thesis,and supports the functions of query,reading,download and prediction.This work is of great significance for the prediction of hot spot residues in protein DNA interaction.
Keywords/Search Tags:Machine Learning, Binary classification, neural network, protein, the prediction of hotspot, DNA, hotspot
PDF Full Text Request
Related items