Font Size: a A A

Protein Secondary Structure Prediction Based On Generative Adversarial Network And Bidirectional Long Short-term Memory Recurrent Network

Posted on:2022-04-18Degree:MasterType:Thesis
Country:ChinaCandidate:X Y HanFull Text:PDF
GTID:2480306323960699Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The protein is the main undertaker of life activities.Fully understanding their structures and interactions is of great significance for the treatment of diseases and the development of new resistant drugs.However,the number of known proteins is increasing rapidly,so it is difficult to determine their three-dimensional structures by direct observation.Therefore,how to successfully predict the tertiary structure of proteins has become a hot research direction.This study innovatively utilized the technology of generative adversarial networks(GAN)and bidirectional long short-term memory recurrent networks(BiLSTM)to predict the secondary structure of proteins,which effectively played the key role of connecting the preceding and the following,and provided a new idea to solve the problem of high difficulty in directly using amino acid sequence to predict the tertiary structure.The main work of this paper is as follows:(1)Set up experiments to evaluate the prediction server of protein secondary structure.The experiment selected 7 popular prediction servers: PSRSM,SPOT-1D,MUFOLD,SPIDER3,RAPTORX,PSIPRED and JPRED4,and 180 proteins were selected from the public protein database PDB,divided into three groups according to different homology,evaluated from the four perspectives of Q3,SOV,boundary recognition rate and internal recognition rate.At the same time,the usage method and prediction principle of each server are described and summarized.Through the above research,the differences and advantages of different methods are obtained.Experiments show that PSRSM has achieved the best results from different evaluation angles.(2)Prediction work based on BiLSTM and 42 radical group features.This work makes full use of the advantage of BiLSTM in time series data processing,and takes the complete protein sequence as the input to capture the long-distance interaction of amino acids,while retaining the influence of the previous and later articles.In the aspect of feature selection,42 radical group codes are added to the commonly used position specific scoring matrix(PSSM),and the large data set CULLPDB is used for training.On the common test sets CASP9,CASP10,CASP11 and CASP12,the accuracy of Q3 is 85.74%,86.83%,84.73%,and 83.79% respectively.The experimental results show that the training method of adding 42 radical group coding and complete sequence input effectively improves the prediction accuracy.(3)Combination model prediction based on GAN and BiLSTM.Combination model prediction work based on GAN and BiLSTM.This is the first time that GAN and BiLSTM have been combined to predict protein secondary structure.A properly trained GAN can fit the distribution of random noise to the distribution of real data,and use this feature of GAN to generate new data that approximates the distribution characteristics of real protein structures,and combine with PSSM to classify in BiLSTM.Two combined models are designed in the experiment.The first is to use the GAN constructed by convolution neural network(CNN)and then combine with BiLSTM.,using sliding windows to divide the data into fixed length sizes;the second one is to use a fully connected network to build GAN Combined with BiLSTM,the sliding window restriction is removed,and the two models are compared from the perspectives of Q3 and SOV.The experimental results show that the model with a complete sequence input has a better classification effect.The model adopts a combined training method of supervised and unsupervised.Compared with other models,it has fewer feature inputs and further improves the prediction results.The experimental results show that the model proposed in this paper based on GAN and BiLSTM is effective in predicting the secondary structure of proteins.Training with complete sequences can better capture the long-distance and contextual interactions of amino acids,while reducing the design of features,the process provides new ideas for protein secondary structure prediction.
Keywords/Search Tags:protein, protein secondary structure prediction, deep learning, recurrent network, generative adversarial network
PDF Full Text Request
Related items