Font Size: a A A

Assessment Of Protein Structure Models Based On Generalized Solvation Free Energy Theory

Posted on:2021-01-05Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhangFull Text:PDF
GTID:2370330620971936Subject:Biological engineering
Abstract/Summary:PDF Full Text Request
Protein structure prediction is one of the most representative and influential research directions in the field of computational biology and bioinformatics.Protein structure model evaluation is generally the final step for protein tertiary structure prediction and protein design.It is responsible for screening results from a large number of candidate structures that are closer to the real structure.At present,there are two main ideas for the evaluation of protein structure models: "knowledge-based"(KB)and "physical-based"(PB).The performance of traditional "physical-based" models is not as effective.We propose a generalized solvation free energy framework.The main idea is to define each basic physical component unit of a given complex system as a solute,and define all its surrounding units as its specific solvents.It can be used flexibly at multiple scales in nature and is suitable for the implementation of machine learning.The Cullpdb dataset was generated on 2018.11.26,where the similarity of any two sequences is less than 25%,the resolution is less than 2.0 Angstroms,and the R-factor is less than 0.25.We downloaded 8129 pieces of data from the PDB database as the original data set.In this article,the amino acid sequence and the amino acid sequence are considered as solute units.In the downloaded data,by defining the specific solvent environment of the solute unit,using the biopython library in Python to process the PDB data,extracting and calculating the relevant data as the solvent characteristics,including 6 angles representing the relative position of the solute unit and the solvent environment,The spatial distance between the solute unit and the solvent environment,as well as the amino acid type of the solvent environment,a generalized solvation free energy model at the amino acid level was realized using neural networks.The model aims to predict the solute unit category as much as possible.The more accurate the prediction,the lower the calculated free energy of the protein,which means that the corresponding protein structure is more stable,and we can thus distinguish between natural and unnatural protein structures.For the neural network model of amino acid trisomy,we specially constructed a new hierarchical softmax output layer,and replaced the original softmax output layer with a Huffman tree structure,which improved the model training efficiency to a certain extent.
Keywords/Search Tags:Protein structure prediction, generalized solvation free energy, neural network, hierarchical softmax
PDF Full Text Request
Related items