Font Size: a A A

Identification Of Protein-metal Ion Ligand Binding Sites Based On Deep Learning Algorithm

Posted on:2022-09-06Degree:MasterType:Thesis
Country:ChinaCandidate:K SunFull Text:PDF
GTID:2480306542978909Subject:Physical Electronics
Abstract/Summary:PDF Full Text Request
Protein is an important component of all cells and tissues of the human body,and is the main undertaker of metabolism and cell life activities.The function of proteins cannot be realized without the participation of ligands,and only when proteins and ligands combine to form stable compounds can they realize their functions.Metal ion,as an important protein-binding ligand,affect many necessary physiological processes of the human body,and it is of great significance to predict their binding residues.It can not only improve the understanding of protein function,but also provide important theoretical support for disease diagnosis,prevention and treatment at cellular and molecular levels and for targeted drug and molecular drug design.Meanwhile,it is difficult to predict metal ion ligands because of their small volume and active physicochemical properties.Therefore,the prediction of metal ion ligand binding sites is an important and complex research task.Based on amino acid sequence information,amino acids,physicochemical information of amino acids and predicted structural information are selected as characteristic parameters,and two deep learning algorithms are used to predict the binding sites of metal ion ligands.Specific work is as follows:(1)The metal ion ligand dataset was constructed.Since the number of negative sets in the dataset is much larger than that of positive sets,an undersampling means is used to process the dataset and a dataset with equal number of positive and negative fragments is constructed for prediction.Based on amino acid sequence,amino acid,charge,hydrophilic,relative solvent accessibility and secondary structure are selected as characteristic parameters,and their component information,site conservation information and information entropy are extracted.Meanwhile,considering that there are disordered region sequences with uncertain conformation in the protein sequence besides the sequence fragments with fixed structure,a new characteristic parameter of disorder value is introduced and classified after statistical analysis.Considering that specific binding residues have a preference for the use of amino acids and their physical and chemical properties during the binding of ligands to proteins,the extraction of propensity factors is introduced and the binding residues and their physical and chemical characteristics are extracted as new feature parameters.(2)The Recurrent Neural Network(RNN)algorithm,one of the deep learning algorithms,is used to predict the ligand binding sites of five(Zn2+,Fe3+,Co2+,Ca2+,Na+)metal ions.The method of undersampling is adopted to process the data set,which avoids the influence of imbalance between positive and negative sets on the algorithm.Hyper-parameter optimization is performed during the algorithm learning process,and three hyper-parameters were selected:the number of hidden layer layers,the number of hidden layer nodes,and the batch value.After the hyper-parameter optimization,the overall prediction results have greatly improved,including the Sn values of three ion ligands increased by more than 2.5%,while the Sp values and Acc values of five metal ion ligands increased,and the MCC values changed obviously,all of which increased by about 0.08.Compared with the Random Forest(RF)algorithm,the Sn value of RNN algorithm is only slightly higher than that of RF algorithm,but the Sp value and Acc value are improved compared with RF algorithm,among which Na+ligand is the most obvious,increasing by10.7%and 10.4%respectively.The MCC value of RNN algorithm is better than RF algorithm,and Zn2+and Fe3+ligands are increased by 0.078 respectively According to the four evaluation indexes,RNN algorithm has better prediction performance than RF algorithm in this study.(3)Another representative deep learning algorithm,Deep Neural Network(DNN),is used to predict the binding sites of alkaline earth metal ions(Mg2+and Ca2+)ligands.After optimization of the hyper-parameters of the deep learning algorithm,the prediction results are further improved,and the prediction results by the 5 cross-validation test are better than those of the more advanced Ionseq method.The Sn and MCC values of the DNN algorithm are higher than those of the Ionseq method,and the Sn value is improved by more than 3.7%and the MCC value is improved by more than 0.02.To further verify the validity of the prediction model based on deep neural network,the deep neural network algorithm is applied to the data set processed by the undersampling method,and the prediction result by independent test is better than that of the SVM algorithm using the same data set.The DNN algorithm is better than the SVM algorithm except that the Sn value of Mg2+ligand is slightly lower,and the Sn value of Ca2+ligand is 11.6%higher than that of SVM algorithm.With the introduction of disorder values for the feature parameters and propensity factors for the feature extraction method,the prediction using the deep neural network algorithm is improved for all four metrics,with the Sn value improving by more than 1.7%and the MCC value improving by more than 0.027.This indicates that the disorder values and propensity factors contribute to the prediction of ligand binding sites for alkaline earth metal ions.
Keywords/Search Tags:Protein-ligand binding site, Metal ion ligand, Recurrent neural network, Deep neural network
PDF Full Text Request
Related items