Font Size: a A A

Research On Population Evolution Based Conformation Optimization Algogithms In Ab-initio Protein Structure Prediction

Posted on:2020-02-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:X H HaoFull Text:PDF
GTID:1360330599976108Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Proteins are widely distributed in various tissues of organisms and play an indispensable role in the life process of organisms.For better understanding the functions of proteins in organisms,revealing the pathogenesis of diseases which caused by protein structure changes,and finally achieving the purpose of targeted treatment,obtaining the three-dimensional structure of proteins is the most direct and effective way.The development of information technology and its cross with biology provides a short-period and low-cost computational method for obtaining the three-dimensional structure of proteins.Directly start from the amino acid sequence,with the use of computers,to design an effective algorithm for obtaining the three-dimensional structures of proteins is a hot research topic in bioinformatics.The research of this thesis mainly focus on the conformation optimization method in protein structure prediction.Predicted protein 3D structures,the distance profiles obtained from the protein homologous sequence alignment,the residue contact map,secondary structural information can be used for reflecting the influence on protein structure caused by the amino acid changes.A preliminary study of the cancer driver mutations prediction is also been conducted in this thesis.On the basis of the population-based evolution algorithm framework,based on the abstract convex underestimate technique,abstract convex underestimate guided conformational space sampling method is proposed for ab-initio protein structure prediction;For further improving the sampling efficiency,Lipschitz underestimate guided conformational feature space sampling method is proposed;For keeping the reasonable conformations alive in the sampling process,multi-population based conformational sampling method is proposed;To address the cancer driver mutations prediction problem,amino acid sequence and protein structural information based single amino acid mutations assessment and prediction method is proposed.The main works of this thesis are summarized as follows:1.To address the searching problem of protein conformational space in ab-initio protein structure prediction,a novel method using abstract convex underestimation(ACUE)based on the framework of evolutionary algorithm is proposed.The highdimensionality original conformational space was converted into feature space whose dimension is considerably reduced by feature extraction technique.And,the underestimate space could be constructed according to abstract convex theory.The tight lower bound estimate information was obtained to guide the searching direction.Additionally,fragment assembly and the Monte Carlo method are combined to generate a series of metastable conformations by sampling in the conformational space.Test results show that the ACUE can efficiently obtain the near-native protein structure.2.For further improving the sampling efficiency,based on the population-based evolution algorithm framework,a plugin method for guiding exploration in conformational feature space with Lipschitz underestimation(LUE)is proposed.The conformational space is converted into ultrafast shape recognition feature space firstly.Then,the conformational space can be further converted into underestimation space according to Lipschitz estimation theory for guiding exploration.With the use of underestimate information,the number of energy function evaluations can be reduced,thus improve the sampling efficiency.Test results show that near-native protein structures with high accuracy can be obtained more rapidly and efficiently.3.Protein structure prediction can be considered as a multimodal optimization problem for sampling in the protein conformational space.To address this problem,a conformational space sampling method using multi-subpopulation differential evolution(MDE)is proposed.MDE first devotes to generate given numbers of concerned modal under the ultrafast shape recognition-based modal identification protocol.Then,differential evolution is used for keeping the preserved modal survival in the evolution process.Meanwhile,a local descent direction used to sample along with is constructed based on the abstract convex underestimate technique for modal enhancement,which could enhance the ability of sampling in the region with lower energy.Through the sampling process of evolution,several certain clusters contain a series of conformations in proportion to the energy score will be obtained.Test results show that near-native conformations can be effectively obtained by MDE.4.For cancer driver mutations prediction problem,an amino acid sequence and protein structural information based computational prediction method for assessing the single amino acid variants(AssVar)is proposed.First,22 features ranging from amino acid level to protein 3D structural level are extracted and designed as the input of Random Forest classifier;Then,collected cancer driver mutations and neutral mutations are used for training the classifier;AssVar is applied to an independently collected testing data set for testing the performance and comparing to other state-ofart methods;Finally,case study shows the effectiveness of AssVar.
Keywords/Search Tags:protein structure prediction, abstract convex underestimate, evolution algorithm, multimodal optimization, cancer driver mutation
PDF Full Text Request
Related items