Font Size: a A A

Research On Protein Secondary Structure Prediction Based On Deep Learning Method

Posted on:2019-11-02Degree:MasterType:Thesis
Country:ChinaCandidate:Z S ZhaoFull Text:PDF
GTID:2370330566998901Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of bioinformatics,it has brought revolutionary progress in the field of biological science and greatly promoted the development of biological science.The secondary structure of proteins is the basis of the relation between their functions and structures.They is essential information in the design of drugs and enzymes.However,obtaining protein secondary structure through experimental methods requires is costly and thus difficult to be popularized.In recent decades,the scientists are committed to using machine learning methods to predict the secondary structure of protein.However,the prediction accuracy is not high.How to improve the secondary structure of protein prediction accuracy has become a popular research topic in bioinformatics.Considering that the formation of the protein secondary structure is influenced by a variety of factors,in this study,we investigate the predication method from three aspects including amino acid representation,protein secondary structure prediction model and prediction model with protein space structure information.The main work of this paper includes:The representation of amino acids is the basis in the prediction of protein secondary structure.Current mainstream amino acid representation is the one-hot vector,which is sparse and difficult to store amino acid identity information.To capture the physicochemical properties and evolutionary information of each amino acid,amino acid embedding and amino acid representation by PSSM(PositionSpecific Scoring Matrix,PSSM)were concatenated in this paper to convert the protein sequence into a matrix in order to obtain better representation capability.To extract the local context and remote dependence information of specific amino acids simultaneously for designing the protein secondary structure predication algorithm,we introduce the gate mechanism into the convolutional neural network framework,and propose a new protein secondary structure prediction model,named as CNNH_PSS.The evaluations on CB6133 and CB513 protein structure prediction benchmark dataset show that CNNH_PSS outperforms the state-of-the-art model.Especially,the training convergence speed improves nearly 50 times.Considering that the formation of protein secondary structure is influenced by structural properties of protein.In this paper,we further introduced the task of protein solvent accessibility prediction.Multi-task learning method is applied to predict the information of protein solvent accessibility to improve the prediction of protein secondary structure.To solve the problem of insufficient utilization of related task information in existing multi-task learning frameworks,an end-to-end iterative multi-task framework is proposed.The evaluations on CB6133 and CB513 benchmark datasets show that this method outperforms CNNH_PSS model on accuracy.It achieves the highest performance on these datasets,based on our knowledge.
Keywords/Search Tags:protein secondary structure prediction, amino acid representation learning, convolutional neural network, gate mechanism, multi-task learning
PDF Full Text Request
Related items