Font Size: a A A

Prediction Of Polyproline Type Ⅱ Secondary Structure By Machine Learning Approaches

Posted on:2006-07-30Degree:MasterType:Thesis
Country:ChinaCandidate:K Z LuFull Text:PDF
GTID:2120360182965482Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Protein three-dimensional structures are needed in the biomedical field because the relation between the functions of proteins and protein three-dimensional structures is close. Unfortunately, it is more difficult to ascertain the structure of a protein than an amino acid sequence because the details of the structure are located at the level of atoms. Believing the three-dimensional structure of a protein is determined by its sequence, people try to seek a shortcut to the structure and function of proteins by amino acid sequences.This dissertation predicts the polyproline type II (PPII) just according to amino acid sequences. The PPII helix structure is a kind of rare secondary structure. Unlike alpha-helix and beta-sheet, PPII structure is not included in any protein structure database due to the lack of hydrogen-bond theory model. Firstly the dissertation culls protein cases with high resolution and low identity from the Protein Data Bank (PDB) by PISCES server. Then separate the PPII helix structure from non-PPII helix structure by torsion angles according to its definition, and forms the PPII and non-PPII structure two classes data set. The data set is badly non-uniform because the PPII helix structure class only takes up two percent. However, the machine learning algorithms need a uniform data set. To equalize the two classes, here choose to prune non-PPII structures randomly. Finally, the machine learning algorithms are trained and tested by the data set.This dissertation deals with PPII classes respectively by Artificial Neural Network(ANN), Genetic Neural Network(GNN),and Support Vector Machine(SVM). The approaches of GNN and SVM are first introduced to predict PPII helix structure. Comparatively, the hybrid algorithm combined GA and BP is better than the algorithm only used BP in training BP model, and the SVM model based on Statistical Learning Theory (SLT) gets the best result. When the width of Gaussian radial basic function(RBF) σ= 5, the penalizing parameter C=100, and the length of inputting window l=1, the best test result of SVM: Sensitivity=78.1%, Specificity=74.9%, and overall accuracy Q=76.5%. In addition, a new coding method has developed in this dissertation. If ANN is trained and tested by the data set coded with this method, the ANN can predict the PPII classes better, and the Sensitivity can reach 74.5%. However, it is more difficult to predict the non-PPII classed, so the overall accuracy is lower.
Keywords/Search Tags:Bioinformatics, Protein Secondary Structure, Polyproline type II Helix Structure, Prediction of Polyproline type II Helix Structure, Artificial Neural Network, Genetic Neural Network, Support Vector Machine
PDF Full Text Request
Related items