Font Size: a A A

Study On Deep Learning Model For Predicting Protein Secondary Structure Using Only Primary Sequences

Posted on:2018-06-20Degree:MasterType:Thesis
Country:ChinaCandidate:H WuFull Text:PDF
GTID:2310330542460457Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Protein secondary structure(SS)prediction is very important for studying protein structure and function.In this paper,based on the analysis of the previous methods for protein secondary structure prediction,it is considered that the secondary structure of protein is affected by remote residues to a certain extent,and the accuracy of the secondary structure prediction can be effectively improved by using the long-term information of protein sequence and the evolutionary information of protein sequence.At the same time,the protein sequence,in essence,is also a sequence of strings,and protein sequence learning can also be seen as a special sequence of learning.Thus,a depth learning model is proposed to predict the secondary structure of the protein.The main contributions of this paper are as follows:(1)Firstly,the distribution model of each amino acid sequence was established,and the overall effect was improved by 10% compared with the embedding of the untrained amino acid.Then,the pre-trained protein sequences were identified by two long and short term memory neural networks(one positive and one reverse).Finally,the obtained vector representation is used as input,and the conditional random field classifier is used to predict the secondary structure of the protein.The entire training process,through the back propagation to update the model parameters.(2)This paper presents a deep learning model for predicting protein secondary structure from sequences.The model uses Word2 Vec to transform the amino acid sequence into vector,and then uses the depth neural network constructed by the long and short memory network to obtain the fixed length feature representation of the sequence.The characteristics used in the prediction algorithm are obtained through the "learning",which overcomes the excessive human intervention in the traditional machine learning.(3)Experimental results show that this model can obtain ~73.9% Q3 accuracy and ~64.9% Q8 accuracy,respectively,on the Cull PDB test proteins.We achieve ~63.5% Q8 accuracy on the public CB513 benchmark dataset.
Keywords/Search Tags:Protein secondary structure, Word2Vec, Long Short-Term Memory, Conditional Random Field, Deep Learning
PDF Full Text Request
Related items