Font Size: a A A

Research On Prediction Of Polyproline Type Ⅱ Structure Based On Multi-feature Fusion

Posted on:2024-06-25Degree:MasterType:Thesis
Country:ChinaCandidate:C FengFull Text:PDF
GTID:2530307136975719Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The funcions of proteins and protein three-dimensional structures is close.It is of great significance to explore the prediction of protein secondary structure in the field of bioinformatics.Predicting the polyproline type Ⅱ helix(PPⅡ helix)structure is crucial important in many research areas,such as the protein folding mechanisms,the drug targets and the protein functions.However,many existing PPⅡ helix prediction algorithms encode the protein sequence information in a single way,which causes the insufficient learning of protein sequence feature information.Moreover,the research on the prediction of polyproline type Ⅱ structure is limited by the difficulty of considering both local and global information among protein sequences,the lack of prediction algorithm models and the complexity of prediction models.For this reason,this paper mainly uses the deep learning model to further improve the prediction accuracy of polyproline type Ⅱ structure.The main research work of this paper was illustrated as follows:(1)Multi-feature fusion.In order to enrich the coding methods of protein sequences,we collected and used common coding methods such as Word2 vec,PSSM,amino acid orthogonal codes and physicochemical properties of amino acids to characterize amino acid characteristics and the above features were fused into the model.In this paper,the ATTDCNN-BILSTM model is proposed for the first time to predict the PPⅡ helix structure.At the same time,combined with the sliding window technology,the data obtained by dividing the data set into the optimal size window is input into the model to capture the characteristics of local amino acid residues.According to the experimental results,the optimal sliding window size is 13 and the best feature combination is Word2vec_PSSM_OC_PhyChem.At the same time,we also explored the influence of attention mechanism on the model.Experiments show that,compared with the most advanced prediction algorithm(BERT-PPⅡ),on the balanced PPⅡDB91,our proposed ATT-DCNN-BILSTM model has better performance in evaluating index sensitivity,Matthews correlation coefficient,accuracy and AUC.Increased by 26.6%,0.7%,5.7% and12.6% respectively;Similarly,on the balanced PPⅡDB99,the relevant evaluation indexes of the model increased by 36.9%,5.9%,6.4% and 11.5% respectively.Therefore,the ATTDCNN-BILSTM model proposed in this chapter has better performance and is better than the BERT-PPⅡ model in most evaluation indicators.(2)Fusion of local and global information.To improve the protein sequence encoding performance,this paper proposes a BERT-based PPⅡ helix structure prediction algorithm(BERT-PPⅡ),which learns the protein sequence information based on the BERT model.The BERT model’s CLS vector can fairly fuse sample’s each amino acid residue information.Thus,we utilized the CLS vector as the global feature to represent the sample’s global contextual information.To some extent,it enhances the acquisition of long-distance information.As the interactions among the protein chains’ local amino acid residues have an important influence on the formation of PPⅡ helix,we utilize the CNN to extract local amino acid residues’ features which can further enhance the information expression of protein sequence samples.Because convolution neural network is better at dealing with local correlation in dense space,the characteristics of protein sequence can be expressed as matrix,and the local amino acid contains rich information.In this paper,we fuse the CLS vectors with CNN local features η to improve the performance of predicting PPⅡ structure.Compared to the state-of-the-art PPⅡPRED method,the experimental results on the unbalanced dataset show that the proposed method improves the accuracy value by 1% on the PPⅡDB91 and 2% on the PPⅡDB99.Compared with the best existing algorithms,the proposed algorithm improves the accuracy(ACC)value by 9.0% and AUC by 0.4% on the balanced PPⅡDB99;Similarly,the proposed algorithm improves the accuracy(ACC)by11.5% and the AUC value by 0.4% on the balanced PPⅡDB99.The above experimental results have proved that the proposed BERT-PPⅡ method can achieve a superior performance of predicting the PPⅡ helix.
Keywords/Search Tags:Prediciton of polyproline type Ⅱ structure, BERT, Convolutional neural networks, Bi-directional LSTM, Attention mechanism
PDF Full Text Request
Related items