Font Size: a A A

Protein Secondary Structure Prediction Using Conditional Random Fields And Deep Learning

Posted on:2020-11-26Degree:MasterType:Thesis
Country:ChinaCandidate:L L WangFull Text:PDF
GTID:2370330575987990Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of bioinformatics,protein sequence information in protein databases has exploded.After the emergence of bioinformatics,people can better use these protein information to understand biological systems.Bioinformatics can use these sequence information to find related proteins,and collect other information to speculate about the structure and function of unknown proteins.Analysis and prediction of protein structure are also often used in drug design.At present,The cost of obtaining secondary structure of protein by experimental method is high,and there is a shortage of professionals.So the core problem is to find an efficient algorithm for predicting secondary structure of protein by bioinformatics.In this paper,deep learning algorithm and conditional random field(CRF)algorithm are used to predict protein secondary structure.Position-Specific Scoring Matrix(PSSM)is used in protein data processing.In order to better represent amino acid sequences,sliding window technology is used.In terms of protein secondary structure prediction algorithm,two learning classification methods are proposed in this paper.The first one is the algorithm of convolution neural network combined with Softmax classifier.This method improves the model structure of convolution neural network.To solve the problem of gradient disappearance,the activation layer of Rectified Linear Units(ReLU)is added after each convolution layer.In order to retain the important features of original data to the greatest extent,the feature data before the whole convolution layer is extracted as Softmax.The input of classifier classifies and predicts the secondary structure of protein.Compared with the traditional convolution neural network method,this method improves the prediction accuracy.The second one is based on the idea of ensemble learning: a simple ensemble strategy is used to combine convolutional neural network and conditional random field model,so that the two kinds of learners can maximize their advantages and make up for each other's shortcomings.Finally,the ensemble classifier is used to classify and predict the secondary structure of proteins,which improves the prediction accuracy.Experiments show that the two methods proposed in this paper can improve the accuracy of the open protein dataset 25 PDB.Experiments show that the prediction accuracy of the ensemble learner composed of convolutional neural network andconditional random field model based on ensemble learning strategy is higher than that of CNN-Softmax network model on 25 PDB data set.Therefore,the combination of deep learning algorithm and conditional random field model can better improve the prediction accuracy of protein secondary structure.
Keywords/Search Tags:protein secondary structure, convolution neural network, condition random fields, ensemble learning, softmax
PDF Full Text Request
Related items