Font Size: a A A

Research And Applications On Method Of Protein Binding Site Prediction

Posted on:2013-02-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z J QiuFull Text:PDF
GTID:1114330371996735Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Biomolecules and many other organic ligands can bind to proteins with high affinity at specific sites on the protein surface. The question of what distinguishes such recognition sites from other surface regions of proteins has been the subject of intense experimental and theoretical research. In recent years, the possibility to predict putative binding regions on the surface of protein molecules has become increasingly important. Together with the rapidly growing structural knowledge of proteins of biological and medical importance, such prediction methods become more applicable and can be helpful for rational drug design and to elucidate the function of a protein molecule. Both applications, function prediction as well as rational drug design, require a reliable method for identifying and characterizing the ligand-binding sites of a protein. The availability of3D structures of many proteins in complex with proteins or other types of ligands (lipids, nucleic acids or drug-like molecules) allows the systematic comparison of protein surfaces involved in interactions. Comparative studies of the amino acid distribution and physicochemical features of protein-protein interface and proteins in complex with small organic drug-like ligands made it possible to characterize recognition sites. A variety of computational methods have been developed that try to integrate this information for predicting putative binding sites in proteins. However, current methods lack adequate prediction precision, and binding site prediction need be studied to further improve its performance and open its crucial factors out which exercise importance to it. This dissertation is divided into four chapters, aiming at protein binding site prediction.In chaper I, firstly, the principle of protein-ligand interaction is described, containing thermodynamics theory, theoretical model of binding process and physical properties. Secondly, current status of research on protein binding site prediction is summarized, including two parts:protein-ligand binding site prediction and protein-protein binding site prediction. Finally, the main working and finding are outlined in the dissertation.In chapter II, two novel amino acid composition preference models are proposed, taking atom and atomic contact couple as statistical objects, respectively, which are different from that of the traditional model. The testing results of binding pocket identification method based on entire pocket show that the atom-based and atomic contact couple-based models are better than the residue-based model. Considering so-called hotspots existing in a binding pocket, the local region with the biggest value of residue preference is looked as a hotspot of which preference value characterizes the pocket. The local residue preference-based binding pocket identification method is constructed by combining hotspot preference and pocket size. Compared with several recently published prediction methods, this method achieves equivalent accuracy and consumes less time. In chapter III, considring the difference of geometric feature and physicochemical property between protein-protein binding sites and protein-ligand binding sites, two residue-characterizing models are proposed, which are single-patch model and multiple-patch model. Based on residue feature computed using the residue-characterizing models, binding residue classifiers are constructed using a machine learning algorithm-random forest. Furthermore, a novel clustering method is proposed to find and predict binding sites. These methods have been used to predict protein-ligand binding residues and protein-protein binding residues. Based on same dataset and success criteria, random forest classifier using single-patch model is used to predict protein-ligand binding sites and performs better than Q-SiteFinder, SCREEN and Morita's method. The results for balanced accuracy and CC (Correlation Coefficient) values show that the multiple-patch based classifier predicts protein-protein binding residues, better than Yan's method, Wang's method and Chen and Jeong's method. Also, for protein-protein binding site prediction, the classifier performs better than Bradford and Westhead's method, Bradford and Needham's method, Higa and Tozzi's method.In chapter IV, protein binding site prediction methods using random forest are applied to aiding molecular docking. For protein-ligand docking, the random forest classifier can reduce conformational search space in front-end use. The docking results show that this method can more accurately find the docking sites than the binding site predictor in Accelrys Discovery Studio. For protein-protein docking, the random forest classifier, as a type of scoring function, can select nearly-native poses in back-end use. The docking results show that this method has equivalent performance to and complementarity to the ZDOCK scoring functionIn the end, it is the conclusion in which the main content of this thesis is summarized and the prospect for the future research work is described.We gratefully acknowledge financial support for this work from the National Natural Science Foundation (grant10772042), the National High-tech Research and Development Program (grant2006AA01A124) and Major State Basic Research Project (grant2009CB918501)of China.
Keywords/Search Tags:Amino Acid Composition Preference, RandomForest Algorithm, Residue Property Definition Model, Molecular Docking
PDF Full Text Request
Related items