Font Size: a A A

Protein Tertiary Structure Prediction Based On Deep Learning

Posted on:2021-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:N Q HuangFull Text:PDF
GTID:2370330614953809Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Proteins are large,important biological molecules.Direct prediction of a protein's tertiary structure based on amino acid sequence is a challenging problem that has a significant impact on modern biology and medicine.The results of such predictions play key roles in understanding of protein function,design of proteins for new biological functions,and research and development of new drugs.With the completion of the Human Genome Project,more proteins' amino acid sequences have been analyzed by genome-sequencing technologies.Meanwhile,scientific researchers have continued exploring and practicing.The main experimental methods are currently X-ray crystallography,NMR and Cryo-EM.These existing methods often require much time and expensive resources,which prevents the speed of experimental protein structure determination from keeping up with the explosive growth in the number of available sequences,so that it has become one of the main obstacles for us to quickly obtain a large number of protein structures.In response to the above problems,this paper proposes to use deep learning to predict the tertiary structure of protein,so as not to rely on complex experimental observation steps.Specific prediction methods can be divided into:1.In view of the main shortcomings of template-based modeling and de novo modeling at this stage,template-based modeling may not be able to find the entire template or the template is of poor quality,and the lack of template support for de novo modeling usually results in low prediction accuracy.This paper proposes a method of using template constraints to assist in de novo modeling.This method combines the advantages of existing mainstream methods.The main idea is to use a de novo modeling method,and use the additional information provided by the template to optimize the results during the modeling process.2.In ab initio modeling,this paper designs two deep neural networks to predict the distance of protein amino acid residues and the main chain dihedral angle by integrating protein structure,coevolution,and physical-chemical characteristics.Then use prediction information to constrain the modeling.In order to obtain the initial conformation of de novo modeling,this paper uses PISCES to filter the protein database and establish a local debris library.In the fragment selecting process,the predicted dihedral angle information is used as a constraint,combined with other chemical structural features to control the quality of fragment screening.Finally,the initial conformation is obtained by assembling the spliced fragments.3.In the use of template information,compared with the traditional method using a known protein structure as a template,this article will use multiple pieces of template information.After performing multi-sequence alignment,the aligned sequences are clustered according to homology similarity,and the weight of the class is calculated according to the degree of similarity.In the simulated annealing process of ab initio modeling,this paper proposes a method of stepwise iteration based on the confidence of the constraint information,so that the predicted structure is gradually accurate within a reasonable range.4.For a large number of candidate structures generated by each prediction target,this paper uses a method that integrates multi-scale prediction features for evaluation.Because the predicted features are not accurate,compared with the traditional method of machine learning models that use a lot of prediction information as features,this paper only use linear weighting to avoid over-learning inaccurate features,the calculation model is simple and efficient.In this paper,the targets released in CASP13 is used as test data,so that our method can be compared fairly.Finally,in the 45 contact prediction targets under the FM and TBM / FM categories,the F1 score of the prediction results of our method is first place;in the tertiary structure prediction under the FM category,the sum of the Z-scores of our method in second place.
Keywords/Search Tags:Protein Structure Prediction, Protein Quality Assessment, Deep Learning, Ab initio modelling, Simulated Annealing
PDF Full Text Request
Related items