Font Size: a A A

Research On Prediction Of Protein-ATP And Protein-DNA Binding Sites Based On Deep Learning

Posted on:2022-06-23Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y PeiFull Text:PDF
GTID:2480306329959079Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The binding of protein and ligand is indispensable for many biological processes,such as membrane transport,muscle contraction,gene expression and virus replication.There are many ligands that bind to proteins,among which protein ATP binding sites and protein DNA binding sites are particularly important.As of March2021,the number of ATP binding proteins in PDB database was 26852,accounting for only 15.6% of the total known protein structure;the number of DNA binding proteins with known structure in PDB database was 31896,accounting for only18.6% of the total known protein structure.People still have a far less understanding of ATP and DNA binding proteins.Therefore,the study of binding sites between protein and ATP and protein DNA binding sites is of great significance for the protein exploration of unknown structure.For a long time,the structure information of binding proteins has been obtained by using the techniques of ray crystal diffraction,NMR and cryoelectron microscopy.These methods are expensive,time-consuming and inefficient,which is difficult to meet the needs of large-scale protein data.With the development of computer science and the continuous innovation of artificial intelligence technology,it is highly concerned to predict the binding sites of protein-ATP and protein-DNA by calculation.The main research methods for protein-ATP and protein-DNA binding site prediction are to extract as many features as possible from binding protein sequence.At the same time,the unbalanced samples in the data set are processed by sampling technology,and finally combined with classification algorithm.Although this research framework is widely used,it is very difficult to improve the prediction accuracy.In order to solve the above problems,this paper combines the deep learning method with the traditional feature construction,and uses the deep learning training strategy and the ensemble learning idea to establish an accurate prediction model.In this paper,starting from the position specific matrix characteristics,secondary structure characteristics,solvent accessibility characteristics and sequence characteristics of proteins,we use the principle of sliding window to integrate the eigenvalues of target residues and their adjacent residues,and combine with deep learning method.Finally,we use the integrated learning idea to integrate the model to realize the prediction of protein-ATP and protein-DNA binding sites.The feasibility of our method is obtained by comparing with other classifiers,and it is helpful to predict the binding sites of protein-ATP and protein-DNA.The main research methods of this paper are as follows:(1)The sliding window principle is used to integrate the four features,which can be used as feature data for deep neural network training and testing.(2)The Inception?base network model is designed based on the Inception network design idea,which improves the problems of over fitting and gradient disappearance caused by the small amount of biological data,and the calculation cost is relatively small;The Inception?evolution network model is designed by improving network.It decomposes the large convolution kernel into several small convolution kernels,and uses an activation function after decomposition,which increases the model's ability of fractional linearity;The Inception?res network model is improved by introducing the idea of residual network construction,which further reduces the consumption of computing resources.At the same time,it increases the network depth..(3)The paper introduces the training strategies suitable for data sets: Focal loss function and Warm-up optimization method,which respectively solve the sample data imbalance problem and the over fitting phenomenon at the initial stage of training.At the same time,adjust the network model to adapt to the training strategy.(4)The idea of ensemble learning is introduced to integrate the optimal models of the three networks,and the optimal experimental results are obtained by weighted decision-making.(5)Case study is introduced to verify the universality of this method.The experimental results show that the model designed in this paper has better performance than the comparison method.In the future,we will try to improve the model precision by experimental verification and integration of more protein related eigenvalues.
Keywords/Search Tags:Bioinformatics, Prediction of protein-ATP binding sites, Prediction of protein-DNA binding sites, Feature extraction, Deep learning, Ensemble learning
PDF Full Text Request
Related items