Font Size: a A A

Protein Backbone Dihedral Angle Prediction Based On Lightweight Deep Learning Models

Posted on:2024-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:M L ZhengFull Text:PDF
GTID:2530307127963819Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The protein functions are closely associated with its structure,hence understanding the structure of proteins has continuously been important research in biology and bioinformatics.The primary degrees of freedom of the protein backbone are the dihedral angles (Φ,Ψ) which are generated by the N-C and C-C bonds on the protein backbone.These angles are crucial for the three-dimensional structure and spatial conformation of proteins.Estimating the backbone dihedral angles from protein primary sequences can significantly increase 3D structure prediction and speed up the effective sampling of low-energy structural conformational space,which can be used as a quick and efficient tool for biological research.Therefore,the main research of this thesis is as follows:1)The problem of predicting dihedral angles by computational methods is studied and analyzed in depth.Firstly,the schema of computing models is analyzed,including the characterization of amino acid residues,the representation of prediction results,the evaluation metrics of prediction performance,and public datasets.Next,the traditional machine learning methods applied in dihedral angle prediction is also analyzed.And the recently predicting models built by deep learning are systematically surveyed.The advantages and disadvantages of each model are also summarized.2)Deep learning techniques have advanced significantly in recent years when it applied to predict of protein dihedral angles and other structural properties.This is due to the growth in protein biological sample data as well as the improvement in computing performance.However,the most SOTA prediction models have the following disadvantages,which are built by recurrent neural networks and cannot be trained in parallel,resulting in slow training speed;models are large which need more computing resources,and is unfavorable of the popularization;models are integrated by multi base models.Based on above analysis,a lightweight and faster deep learning model named DCMA is proposed.First,a hybrid perception block is designed using convolutional neural networks.Multi convolutional neural networks with different kernel sizes are assembled;then,groups of stacked dilated convolutional neural networks with the same kernel size but different dilated rates are also combined in the hybrid inception block.The block is intended to capture the long-range and local features of protein sequences.In order to enhance the efficiency of feature capturing,a1A2I(two hybrid perception blocks and one multi-attention block)module is formed which are combined by one multi-attention block and two hybrid perception blocks.The DCMA model are stacked by five 1A2 I modules.Multi-task learning is also applied in DCMA model for restricting the outputs.3)Based on the DCMA model,we analyzed the effects of ensemble learning,features generated from pre-trained protein sequence models and spatial structural attribute of contact maps.The experiments demonstrate that: ensemble model integrated by multiple independently trained base models can effectively improve the prediction performance;using the contact map predicted by SPOT-1D software as residue representation features can improve the prediction performance when the contact map prediction is more accurately;The prediction performance can also be significantly enhanced by using the output of the pretrain models ESM-1v and Prot T5 as additional inputs.
Keywords/Search Tags:Dihedral Angle, Dilated Convolution, H-inception, Attention mechanism, Lightweight model
PDF Full Text Request
Related items