Protein Secondary Structure Prediction Using Conditional Random Fields And Deep Learning

Posted on:2020-11-26

Degree:Master

Type:Thesis

Country:China

Candidate:L L Wang

Full Text:PDF

GTID:2370330575987990

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the development of bioinformatics,protein sequence information in protein databases has exploded.After the emergence of bioinformatics,people can better use these protein information to understand biological systems.Bioinformatics can use these sequence information to find related proteins,and collect other information to speculate about the structure and function of unknown proteins.Analysis and prediction of protein structure are also often used in drug design.At present,The cost of obtaining secondary structure of protein by experimental method is high,and there is a shortage of professionals.So the core problem is to find an efficient algorithm for predicting secondary structure of protein by bioinformatics.In this paper,deep learning algorithm and conditional random field(CRF)algorithm are used to predict protein secondary structure.Position-Specific Scoring Matrix(PSSM)is used in protein data processing.In order to better represent amino acid sequences,sliding window technology is used.In terms of protein secondary structure prediction algorithm,two learning classification methods are proposed in this paper.The first one is the algorithm of convolution neural network combined with Softmax classifier.This method improves the model structure of convolution neural network.To solve the problem of gradient disappearance,the activation layer of Rectified Linear Units(ReLU)is added after each convolution layer.In order to retain the important features of original data to the greatest extent,the feature data before the whole convolution layer is extracted as Softmax.The input of classifier classifies and predicts the secondary structure of protein.Compared with the traditional convolution neural network method,this method improves the prediction accuracy.The second one is based on the idea of ensemble learning: a simple ensemble strategy is used to combine convolutional neural network and conditional random field model,so that the two kinds of learners can maximize their advantages and make up for each other's shortcomings.Finally,the ensemble classifier is used to classify and predict the secondary structure of proteins,which improves the prediction accuracy.Experiments show that the two methods proposed in this paper can improve the accuracy of the open protein dataset 25 PDB.Experiments show that the prediction accuracy of the ensemble learner composed of convolutional neural network andconditional random field model based on ensemble learning strategy is higher than that of CNN-Softmax network model on 25 PDB data set.Therefore,the combination of deep learning algorithm and conditional random field model can better improve the prediction accuracy of protein secondary structure.

Keywords/Search Tags:

protein secondary structure, convolution neural network, condition random fields, ensemble learning, softmax

PDF Full Text Request

Related items

1	Research Of Protein Secondary Structure Prediction Based On Ensemble Learning
2	Algorithm Research Of Protein Secondary Structure Prediction Based On Grouped Multi-Classifier
3	Application Of Machine Learning Algorithm In Protein Structure Prediction
4	Building Extraction From High Resolution Remote Sensing Images Based On Convolution Neural Network
5	Research On Protein Secondary Structure Prediction Based On Deep Learning Method
6	Prediction Of Protein Secondary Structure Based On Deep Learning
7	Application Of Deep Learning Algorithm In Protein Structure Prediction
8	Research On Prediction Algorithm Of Protein Secondary Structure Based On Neural Network
9	Research On Prediction Of Protein-protein Interactions Based On Deep Neural Network And Ensemble Learning
10	Prediction Of Protein-protein Interaction By Ensemble Neural Network And New Coding Method