Study On Deep Learning Model For Predicting Protein Secondary Structure Using Only Primary Sequences

Posted on:2018-06-20

Degree:Master

Type:Thesis

Country:China

Candidate:H Wu

Full Text:PDF

GTID:2310330542460457

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Protein secondary structure(SS)prediction is very important for studying protein structure and function.In this paper,based on the analysis of the previous methods for protein secondary structure prediction,it is considered that the secondary structure of protein is affected by remote residues to a certain extent,and the accuracy of the secondary structure prediction can be effectively improved by using the long-term information of protein sequence and the evolutionary information of protein sequence.At the same time,the protein sequence,in essence,is also a sequence of strings,and protein sequence learning can also be seen as a special sequence of learning.Thus,a depth learning model is proposed to predict the secondary structure of the protein.The main contributions of this paper are as follows:(1)Firstly,the distribution model of each amino acid sequence was established,and the overall effect was improved by 10% compared with the embedding of the untrained amino acid.Then,the pre-trained protein sequences were identified by two long and short term memory neural networks(one positive and one reverse).Finally,the obtained vector representation is used as input,and the conditional random field classifier is used to predict the secondary structure of the protein.The entire training process,through the back propagation to update the model parameters.(2)This paper presents a deep learning model for predicting protein secondary structure from sequences.The model uses Word2 Vec to transform the amino acid sequence into vector,and then uses the depth neural network constructed by the long and short memory network to obtain the fixed length feature representation of the sequence.The characteristics used in the prediction algorithm are obtained through the "learning",which overcomes the excessive human intervention in the traditional machine learning.(3)Experimental results show that this model can obtain ~73.9% Q3 accuracy and ~64.9% Q8 accuracy,respectively,on the Cull PDB test proteins.We achieve ~63.5% Q8 accuracy on the public CB513 benchmark dataset.

Keywords/Search Tags:

Protein secondary structure, Word2Vec, Long Short-Term Memory, Conditional Random Field, Deep Learning

PDF Full Text Request

Related items

1	Prediction Of Protein Signal Peptide Based On Domain Rules And Hybrid Deep Learning Model
2	Research On Meteorological Prediction Based On Long Short-term Memory Network
3	Protein Secondary Structure Prediction Based On Generative Adversarial Network And Bidirectional Long Short-term Memory Recurrent Network
4	Protein Secondary Structure Prediction Using Conditional Random Fields And Deep Learning
5	Research On Short-term Wind Field Forecast And Correction Based On Machine Learning
6	Analysis And Application Of Deep Learning Long-term And Short-term Memory Algorithms And Monte Carlo Method
7	Study On Key Problems Of Protein-Ligand Docking Based On Machine Learning
8	Research On Flash Flood Forecasting Based On Long Short-Term Memory Networks
9	Research On Protein Domain Boundary Prediction Based On Deep Learning
10	Deep Learning And Its Application In Geoelectric Field Anomaly Detection