Font Size: a A A

Protein Secondary Structure Prediction Based On A Balanced Classification Algorithm

Posted on:2018-11-06Degree:MasterType:Thesis
Country:ChinaCandidate:N B LiFull Text:PDF
GTID:2310330536964741Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Proteins play a very critical role in the living process as the material undertaker of the living activities.And the structure of a protein determines its function to a great extent;hence,it is very important to prediction the function of a protein from its structure.The protein structures are divided into four categories,namely,primary,secondary,tertiary,and quaternary structure.The primary structure is the amino-acid sequence of the polypeptide chain.The secondary structure that describes the local structural conformation of polypeptide backbone is reduces to three regular forms in most cases namely,?-Helix(H),?-Strand(E),and Coil(C).The tertiary structure of a protein is largely determined by its primary structure,but the methods to predict tertiary structure directly from the sequence information can hardly achieve a desired result.At the same time the methods that predict tertiary structure from the secondary structure will get quiet a satisfactory accuracy.But accuracy of the ?-Strand Structure prediction is relatively low.We tried to make the prediction more effective through improving the PSIPRED algorithm(a classifier based on neural network with the Position-Specific Scoring Matrices as its input feature)by using balancing strategies.Finally,the predicted protein secondary structure was used to classify protein structural classes.The attempts to improve the prediction in this paper were shown as follows:We have tried to improve the PSIPRED algorithm respectively through weighting output of the networks,(OW-PSIPRED),adjusting cost-function in the training process(CFW-PSIPRED),utilizing a balanced sampling strategy(BS-PSIPRED),and introducing long-distance interactions into encoding mechanism(LI-PSIPRED),finally we found that the OWPSIPRED method got a best performance of 63.73% on?-Strand,and a overall accuracy score of 74.28% on improved CB513 dataset which differs slightly from that of the original PSIPRED algorithm.The accuracy of ?-Strand structure has 2.34 points higher than the original method.We constructed a protein structural classes predictor base on an artificial neural network which utilizes the chaos games representation(CGR)as its input vectors.It finally obtained a satisfactory result of 71% on the data set Astral40,which is more effective than the methods with the CGR of amino acid sequence as its feature.We canconclude that the method this paper proposed will classify the protein structural classes effectively.
Keywords/Search Tags:Protein secondary structure prediction, Artificial neural network, Balanced classifier, Multi-classifier, Protein tertiary structure, Protein structural classes, Chaos games representation
PDF Full Text Request
Related items