Font Size: a A A

Research On Protein Structure Prediction Method Based On Multi-level Information Fusion

Posted on:2022-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:J ChenFull Text:PDF
GTID:2480306731977979Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Protein plays a variety of important roles in life systems and participates in various biological processes including catalyzing metabolic reactions and stimulating feedback,which are closely related to its unique natural structural details.The experimental technology that can determine the natural structure is time-consuming and labor-intensive and is completely unable to follow the pace of the development of protein sequencing technology,resulting in a calculation method that predicts the tertiary structure of the target system only through the amino acid sequence.As a more widely used ab initio calculation method in structure prediction,the highdimensionality of the conformational space filled by all possible structures and the energy function representing the potential energy state of any conformation have always been two major research barriers.In response to these two challenges,this paper studies the search method of protein conformation space based on basic evolutionary algorithms,and combines machine learning technology to propose a protein structure prediction method based on multi-level information fusion.The specific work is as follows:Firstly,a pipeline technology Evo P for decoy conformation evaluation based on optimized search.Aiming at the high-dimensional and continuous nature of protein conformational space,in order to improve the efficiency of decoy structure sampling and the quality of the resulting conformation set,a conformational evaluation strategy pipeline technology with optimized search is proposed.The pipeline covers three strategies: population update,conformational space optimization,and segment library dynamic optimization.It provides a high-quality sampling starting point for the resampling method of protein conformation space based on sequence distribution and secondary structure and Prediction method of protein folding based on multi-level information fusion helps reduce the search space and cover as many low-potential energy areas as possible.Secondly,a resampling method SDSS of protein conformation space based on residues and secondary structure.Aiming at the problem of easy convergence of random search algorithm used to solve complex search and optimization problems in the field of evolutionary computing,through the combination of coarse granulation and fragment replacement technology,this paper constructs several discrete evaluation models based on amino acid sequence distribution,secondary structure types of residues and structural diversity of conformations and proposes a conformation resampling algorithm SDSS to reduce the conformation space of the target system and improve the sampling ability of higher quality decoy structure.The results show that,compared with the popular ab initio protocols Rosetta and QUARK,the proposed algorithm's average TM-score of the first structure obtained on 10 target systems is2.6% and 3.7% higher than Rosetta and QUARK methods,respectively,which is closer to the experimentally determined natural structure.Thirdly,protein structure prediction method MULFOLD based on multi-level constraints.Aiming at the potential distance constraints and interaction signals between residues in amino acid sequence,an optimized contact prediction model DNcon X was designed based on DNcon2 network architecture and an evaluation model was constructed based on interaction constraints through selective feature training.At the same time,a complex crossover strategy is designed in the process of population evolution,so as to switch between different crossover stages according to the topological consistency score of conformation.Combined with different evaluation models designed to guide the generation and optimization of fragment library,a protein structure prediction algorithm MULFOLD based on multi-level information fusion is proposed.The experimental results show that MULFOLD achieves the best average TM-score and mean-RMSD results in the test set,and the maximum number of targets whose TM-score exceeds 0.5.The mean-RMSD and RMSD results on 20 PDB targets are better than Rosetta and 0.49 (?) and 0.52 (?),respectively.The MULFOLD algorithm formed by multiple levels of prior knowledge not only avoids oversampling,but also improves the efficiency of iteration in the process of population iteration,so as to improve the closeness between the output predicted decoy sets and the natural state structure,and plays a good role in the low accuracy of the energy function make up for it.
Keywords/Search Tags:Protein structure prediction, Evolutionary search, Residues interaction, Ab initio
PDF Full Text Request
Related items