Font Size: a A A

Add New Feature Parameters To Identify Protein Metal Ion Ligand Binding Residues

Posted on:2022-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:S XuFull Text:PDF
GTID:2481306542478794Subject:Mathematics
Abstract/Summary:PDF Full Text Request
Protein is an essential part of all cells and tissues in the human body and performs its unique biological functions,which cannot be realized without the interaction with ligands,in which proteins combine with metal ion ligands to play an important biological function,In this paper,10 metal ion ligand binding residues(Zn2+?Cu2+?Fe2+?Fe3+?Co2+?Mn2+?Ca2+?Mg2+?Na+and K+)were used as research objects,based on sequence information of proteins,two new characteristic parameters,amino acid correlation feature and binding residue propensity factor were added,and the Gradient Boosting Machine(GBM)algorithm was used to predict metal ion ligand binding residues.The main research contents of this paper are as follows:(1)Because the correlation characteristics between amino acids were not considered by previous when extracting characteristic parameters from fragments.So we conducted statistics on the association information of amino acids and found that the probability of the occurrence of the adjacent,secondary neighbor and thirdly neighbor of the binding residues was higher,after further screening,finally,the first 100 dimensions with great difference between positive and negative sets were divided into 10 types of amino acid correlation features as new characteristic parameters.(2)As the characteristic parameters used by previous were extracted from fragments,in fact,specific binding residues also have certain preferences for the use of amino acids,through the statistical analysis of the preferred amino acids used by bound residues,it was found that the preferred amino acids used by binding residues and nonbinding residues were significantly different.Then,the preference factors were extracted with the binding residues propensity factors as a new characteristic parameter.(3)The amino acid correlation features and binding residues propensity factors were taken as the new characteristic parameters,and combine the basic characteristic parameters(amino acids,predicted structural characteristics and physicochemical characteristics),the GBM prediction algorithm was used to identify the metal ion ligand binding residues,after optimizing the four parameters of the GBM algorithm,the 5-flod cross-validation results obtained good prediction results,the Snand MCC values of the 10 metal ion ligand prediction results were higher than 10.17%and 0.297,especially the Snand MCC value of transition metals were higher than 34.46%and 0.564.In order to test the effectiveness of the prediction model,the Random Forest(RF)algorithm was used to identify the metal ion ligand binding residues based on the same prediction parameters,and good prediction results were also obtained.Relatively speaking,the prediction result of GBM algorithm was better than the prediction result of random forest,Compared with previous results,GBM algorithm the Snand MCC values of Cu2+,Fe2+,Mn2+,Mg2+and K+ligands were better than those of Ionseq.(4)Due to the data set of metal ion ligand binding residues was severely unbalanced,so we use the commonly used mathematical undersampling in mathematics to process the non-equilibrium data set,and further analyze the metal ion ligand binding residues were recognized.Use amino acids,amino acid correlation features,secondary structure and relative solvent accessibility;the information entropy of hydrophilic-hydrophobic and polarization charge;the propensity factors of amino acid as prediction parameters,Use5-flod cross-validationt and independent test;under the GBM algorithm,a better prediction result was obtained.Through comparison with the results in the literature,the independent test has obtained better prediction results than the previous ones.
Keywords/Search Tags:Metal ion ligand, Binding residues, amino acid correlation features, Binding residues propensity factors, Gradient Boosting Machine(GBM) algorithm
PDF Full Text Request
Related items