Font Size: a A A

A Database Of Alanine Mutagenic Effects For Protein-Nucleic Acid Interface And The Study On The Interface Hot Spots

Posted on:2019-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:L LiuFull Text:PDF
GTID:2310330542993897Subject:Biochemistry and Molecular Biology
Abstract/Summary:PDF Full Text Request
Protein-nucleic acid interactions play essential roles in various biological activities.such as gene regulation,transcription,DNA repair and DNA packaging.Understanding the effects of amino acid substitutions on protein-nucleic acid binding affinities can help elucidating the molecular mechanism of protein-nucleic acid recognition,it also helps to find solutions to complex diseases that are involved in the disregulation of protein-nucleic acid interactions.Until now,no comprehensive and updated database of quantitative binding data on alanine mutagenic effects for protein-nucleic acid interactions is publicly accessible.Thus,we developed a new database of Alanine Mutagenic Effects for Protein-Nucleic Acid Interactions(dbAMEPNI).dbAMEPNI is a manually curated,literature-derived database.The database contains a Core set that comprising over 577 alanine mutagenic data with experimentally determined binding affinities for protein-nucleic acid complexes.It contains several important parameters,such as dissociation constant(Kd),Gibbs free energy change(??G),experimental conditions,and structural parameters of mutant residues.In addition,the database provides an extended dataset of 282 single alanine mutations with only qualitative data(or descriptive effects)of thermodynamic information.Database URL:http://zhulab.ahu.edu.cn/dbAMEPNI/.Based on the alanine scanning data collected in dbAMEPNI,we developed a knowledge-based model to predict the hot spots on protein-nucleic acid interfaces.Hot spots are a small set of residues that contribute most binding affinity for protein-nucleic acids interaction.Compared to the extensive studies of the hot spots on protein-protein interfaces,the study of the hot spot residues on protein-nucleic acids interfaces remains rare.One of the reasons is the mutagenic data for protein-nucleic acids interaction are not as many as for protein-protein interactions.In this study,we collected 503 alanine mutagenic data of protein-nucleic acid interactions from dbAMEPNI,for which the thermodynamic effects were recorded.After remove the redundancy,we obtained 358 alanine mutagenic data on protein-nucleic acid interface,which 299 of them were used as the training dataset for training our model to predict hot spots on protein-nucleic acid interfaces,and the remaining 59 were used as independent test sets to evaluate the generalization ability of the model.To build our model,we generated 97 different structural features and used the decision tree and sequential forward feature selection to select the relevant features.The final model was built based on only 10 features using support vector machine(SVM).The features include two unique features proposed in this study that are ASASsa1/2 and esp3.The former is the square root of the buried absolute solvent accessible surface area of the residue side chain,and the latter is the patch electrostatic potential around the target residue.For the cross validation on the training data set,our model gave recall,precision,accuracy,and F1 score as 0.640,0.764,0.840 and 0.696 respectively,compared to 0.419 0.350 0.609 and 0.381 of mCSM-NA,a state-of-the art model to predict the thermodynamic effects of protein-nucleic acid interaction.The iPNHOT model was further tested on an independent test set of 59 residues on protein-RNA interfaces,of which 3 and 56 are hotspot and non-hot spot residues according to our definition.Our model gave recall,precision,accuracy and F1 score as 0.667,0.400,0.932,and 0.500,compared to 1.00,0.100,0.542,and 0.182 given by mCSM-NA.
Keywords/Search Tags:Hot spots, Protein-Nucleic acid interactions, Machine Learning, Support Vector Machine
PDF Full Text Request
Related items