Font Size: a A A

Random Forest Based Transmembrane Helix Contact Prediction And Coiled Coil Oligomeric State Prediction

Posted on:2015-01-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:X F WangFull Text:PDF
GTID:1260330428960611Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Machine learning methods are computer programs that use sample data or past experience information to solve a given problem and they are becoming increasingly important in developing bioinformatic algorithms. This thesis introduces the work in the field of protein structural property prediction using the machine learning method random forest, including transmembrane helix contact prediction and coiled coil oligomeric state prediction.a-helical membrane proteins play crucial roles in diverse organisms. Many important life processes such as energy conversion, signal recognition and transmit and material transport, are all closely related with a-helical membrane proteins. The structural information of a-helical membrane proteins is very helpful to further understand their functional mechanisms. Because there exist certain technological difficulties to solve a-helical membrane proteins using experimental methods, it is highly desirable to use computational methods to predict the structures of a-helical membrane proteins. Transmembrane helix contact prediction is to predict whether a residue pair of two transmembrane helices contact. It could be used as a constraint to improve the prediction of a-helical membrane proteins. Thus, the author developed a new method called TMhhcp to predict transmembrane helix contacts. First, five statistical approaches based on the coevolution information of residues, were used to predict transmembrane helix contacts. The results of three statistical methods were then combined with other features to build a random forest based predictive model TMhhcp, which performs better than other two methods, TMHcon and MEMPACK. Two definitions of transmembrane helix contacts were used when building the predictive model. Based on the two definitions, the prediction accuracies are48.1%and47.3%respectively, which outperform the correponding accuracies of TMHcon and MEMPACK. The predicted contacts were further used to predict interacting helices and the values of MCC are0.430and0.424based on the two helix contact definitions. The prediction results of different methods were compared and it turns out that the three methods, TMHcon, MEMPACK and TMhhcp are complementary to a large extent, suggesting that the integration of the three methods should result in more powerful performance.The coiled coil is a unique protein structural motif, which consists of two or more a-helices that wind around each other to form a rope-like structure. Due to its particular structural arrangement, the sequence of the coiled coil displays a heptad repeat pattern and many computational methods have been developed to analyze coiled coils. In this thesis, a new predictor RFCoil was devised to distinguish parallel dimeric and trimeric coiled coils. The author first employed the machine learning method random forest, which uses20amino acids’ physical and chemical characters as features, to build a primary predictive model. According to each feature’s importance for prediction given by the model, the most important and non-redundant features were then selected to build a more condensed classifier. RFCoil was compared with the other two prediction methods, SCORER2.0and PrOCoil, and demonstrated that the predictive performance of RFCoil is better than the other two methods. Some important rules were also extracted from the random forest predictive model in order to help protein molecular design.In order to facilitate academic users, the online prediction servers of TMhhcp and RFCoil were built and the corresponding web links are http://protein.cau.edu.cn/tmhhcp and http://protein.cau.edu.cn/RFCoil. In the mean time, the author hopes that the work presented in the thesis can be helpful to strengthen our understanding of the relationship between protein sequences and structures.
Keywords/Search Tags:machine learning, random forest, transmembrane helix contacts, coiled coil oligomericstates, prediction
PDF Full Text Request
Related items