Font Size: a A A

Ab-initio Prediction Of Residue Contacts And Its Application To Protein3D Structure Modeling

Posted on:2015-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:J YangFull Text:PDF
GTID:2180330452963982Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
Proteins play essential and important roles in various crucial cellularprocesses in any living organism. It has been revealed that the proteinfunction is closely related to its structure. Therefore, knowing proteinstructures can help to understand their functions and then guide drugdesign. In the recent years, although the increasing number ofexperimentally solved structures, the gap between the available numberof protein sequences and known structures continues to increase.Fortunately, with the rapid advance in machine learning and data miningtechnologies, it is feasible to predict protein structures from proteinsequences directly.In the literature, the proposed methods for protein structureprediction can be generally grouped into three categories of homologymodeling, fold recognition, and ab initio prediction. For the first and lastclasses of approaches, one common challenging problem is how togenerate the residue-residue contact map that will be further used asconstraints in protein structure assembly. In the case of homologymodeling, residue contact information is mainly derived from the knownhomologous structures in the protein data bank (PDB); while for the abinitio prediction, the contact map is mostly obtained from thesequence-based predictions. Obviously, ab initio predictions are moreaccurate when homologous structures are not available.In the last decades, many approaches have been proposed forresidue-residue contact prediction. However, the prediction accuracy isfar from satisfaction. In this paper, we predict residue contacts from theprimary sequence by merging machine learning method with sequence alignment approach and then use them as constraints for protein structuremodeling. Experimental results indicate that the predicted contacts arevalid for structure modeling and hence can improve the accuracy ofprotein structure prediction. Concretely, this work is consisted of twotopics:(1) inter-transmembrane helix (TMH) residue contact prediction;(2) disulfide bond connectivity pattern prediction.For inter-TMH residue contact prediction, we present a new methodthat merging machine learning-based method with correlated mutationanalysis-based approach. Here, we use the partial correlation analysis tocalculate correlated mutation score. The machine learning-based enginein the proposed protocol is implemented with ensemble classifier. It is thefirst time that correlated mutation score is fused in decision level. Theresults demonstrate that these two engines are highly complementary toeach other and hence improve the prediction accuracy, which is12.5%higher than the current best method from the literature.For disulfide bond connectivity pattern prediction, we present anovel consensus model to predict disulfide bonds with known bondingstates of cysteines. It is the fusion of machine learning-based predictionsand sequence alignment-based annotations. We improve the traditionalmachine learning-based model by introducing the feature of structuraldistance information. In addition, we firstly propose a baseline predictorbased on sequence alignment to assist machine learning-based method.The disulfide bond connectivity pattern is predicted by maximizing thesum of probabilities of possible disulfide bonds. The results show that thecombination of these two methods drives the final robust model achievinghigh prediction accuracy.
Keywords/Search Tags:Protein structure modeling, Residue contact map, Disulfidebond, Correlated mutation analysis, Partial correlation analysis, Sequencealignment, Machine learning, Ensemble learning
PDF Full Text Request
Related items