Font Size: a A A

In Silicon Prediction Of DNA-binding Residues In DNA-binding Proteins

Posted on:2009-05-06Degree:MasterType:Thesis
Country:ChinaCandidate:X S GuoFull Text:PDF
GTID:2120360242980934Subject:Biochemistry and Molecular Biology
Abstract/Summary:PDF Full Text Request
Protein-DNA interactions play a key role in a number of fundamental biological activities, including DNA replication, DNA repair, transcription, DNA recombination and chromatin formation. Although structural data are available for a few hundreds of protein-DNA complexes, the molecular recognition mechanism is still poorly understood. With the rapid accumulation of sequence data of DNA-binding proteins, a reliable identification of DNA-binding residues in DNA-binding proteins is important for functional annotation, site-directed mutagenesis, and modeling protein-DNA interactions.In this study, 673 protein-DNA complexes and coordinates were downloaded from protein data bank (PDB). Redundant protein sequences were excluded from the data set by using BLAST program and 216 non-redundant protein-DNA complexes were finally obtained. Then, support vector machines (SVM) had been trained using four sequence-derived features for prediction of DNA-binding residues. These four features include position specific scoring matrix (PSSM), hydrophobic index, net charge, dipole of amino acid. In construction of SVM classifier, DNA-binding residues and non-DNA-binding residues were defined by calculating the Euclidean distance between atoms of all residues in DNA-binding protein and atoms of bases in DNA strands and calculating the solvent accessible surface area (ASA) of every residue in DNA-binding proteins.Interestingly, the PSSM appears to be the best feature for prediction, suggesting that evolution information of protein sequence has more effects in identification of DNA-binding residues. It is in accordance with the conclusion reported in previous works. It was also found that the predictive performance was enhanced by using multiple-feature for SVM classifier construction from the prediction. The classifier that had been trained using all the four features predicted at 67.76% sensitivity and 77.48% specificity. The classifier had also been evaluated by using the Receiver Operating Characteristic (ROC) curve and the best value of area under curve (AUC) achieved at 0.7882.To determine whether the prediction strength was affected by the sequence context, the slide window size was varied from 3 residues to 17 residues. All four features were used to encode the sequence in our dataset. From the results, it is found that the slide window size of 13 residues to be optimal for prediction of DNA-binding residues.Finally, three DNA-binding proteins were taken from Rat, Arabidopsis thaliana, bacteriophage lambda respectively and a prediction of DNA-binding residues had been done in them by using our SVM classifier. Almost of DNA-binding residues in these three DNA-binding proteins were predicted except for only a few residues by comparing the DNA-binding residues reported in previous experimental works.
Keywords/Search Tags:DNA-binding residue, sequence features, support vector machine (SVM)
PDF Full Text Request
Related items