Font Size: a A A

Deep Modeling Protein Sequence And Its Application

Posted on:2020-11-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:B Z ZhangFull Text:PDF
GTID:1360330578479815Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The biological functions of a protein are largely determined by its three-dimensional structure.When the native three-dimensional structure of a protein is not available,simulat-ing and analyzing its structure will help people to quickly understand protein function,study bio-pathological causes,reduce biological experiments and etc.Inspired by the success of the deep learning application in many fields,we designed deep learning models based on the primary protein sequence for predicting solvent accessi-bility,secondary structure,backbone dihedral angles and other important structural proper-ties:1.Study on prediction of protein solvent accessibility.In this work,a single two-layer prediction model was proposed:firstly,we trained SDBRNN model to predict the value of relative solvent accessible area,and then classified protein residues into two groups according to the cutoff of predicted results of SDBRNN.The SDBRNN model was con-structed by using the bidirectional LSTM networks.The bidirectional node information was merged by the merging operator,which improved the ability to share their information.2.Study on prediction of protein secondary structure.The secondary structure as a local structure is also affected by long-range residues.For the recurrent neural networks is able to capture long range information from sequence and convolutional neural networks can capture local features,our training model named CRRNN is constructed by RNN and CNN.To avoid the parameters explosion caused by feature concatenation of the input from the previous layer,a convolutional neural network with one dimensional step were used to reduce the dimensionality of model.Ensemble learning was also used in our work,where ten independently trained models are integrated into a framework.3.Study on protein-protein interaction sites prediction.The number of interaction sites in sequence is little,so this problem can be converted into an imbalanced classification problem.This thesis proposed three improved strategies to combat with the imbalance:an improvement of sampling granularity;a new penalization factor appended to the loss function according to cost sensitive learning;multi-task learning of interaction sites and residue solvent accessibility prediction for correcting the model’ s preference of the non-interaction sites.Taking into account of the amount of training samples,a simplified LSTM network(SLSTM)was proposed and then a deep learning model(DLPred)was designed based on SLSTM unit.4.Study on multi-task learning for predicting protein solvent accessibility,sec-ondary structure,and backbone dihedral angles.Based on our previous research,a double-channel method of deep learning(CRRNN2)was proposed for jointly learning.CR-RNN2 can concurrently predict solvent accessible areas,secondary structures and dihedral angles.The simplified GRU(GRU2)is acted as hidden nodes of the bidirectional recurrent neural network,and the bidirectional recurrent neural network is designed as the DenseNet structure.The improved Google Inception is used to build another network channel.The main contribution of this work is to build some deep learning models based on protein sequences to predict structural properties of protein.Aiming at different problem-s,we proposed the novel regression model,classification model,deep learning model for imbalanced data,and multi-task model.In the process of modeling,a merging operator for bidirectional recurrent neural network is proposed.The operator has been successfully applied in many deep learning models of this thesis.Based on the consideration of the gen-eralization and parameter scale of deep learning models,a simplified SLSTM network and GRU2 network are proposed,and the connection mode of residual network is re-designed.In addition,an ensemble model is also validated in our works,which have been proved its effectiveness compared with the individual model.In general,the experimental results show that our research will help researchers to further develop the predictors of protein struc-tural properties.At the same time,it has much reference value for the implementation and application of Bioinformatics based on deep learning model.
Keywords/Search Tags:Deep learning, Solvent accessibility, Secondary structure, Backbone dihedral angle, Imbalanced classification, Multi-task learning, Recurrent neural network
PDF Full Text Request
Related items