Font Size: a A A

Study On Applying Computing Techniques To Protein Structure-activity Relationship

Posted on:2018-10-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:L J QuaFull Text:PDF
GTID:1310330542465285Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The biological activities of various organisms are directly or indirectly related to the protein.Protein function is determined by its structure.The structure-activity relationship analysis of the protein is to study the relationship between protein structure and efficacy by computational method.The structure of protein side chain is almost the simplest three-dimensional structure,but it plays an important role in the function of protein.The first point of this dissertation is to study how to pack protein side chain.From a computational perspective,it is an optimization problem with imprecise objective function.Under the influence of internal and external factors,the protein side chain is prone to mutation.The second point of this dissertation is to study the possibility of mutation.The external consequences of protein mutations can lead to certain diseases.The third point of this dissertation is to study the quantitative correlation between protein mutation and disease.According to the above three points,we consider them as machine learning problems.The first research is a prediction of protein side chains with parallel ant colonies.packing the protein side chain is,in fact,a selection of suitable rotamer for each residue so that the new full-atom structure is closest to the native.The side-chain problems have been proven as nondeterministic polynomial-time hard(NP-hard).Computational complexity analysis suggests that any global optimisation algorithms for this problem may,in the worst case,run in exponential time.To overcome such problem,a novel parallel meta-heuristic algorithms is proposed to assemble the different energy functions.The parallel ant colonies share only one pheromone matrix to guide ant in constructing protein side-chain conformations.After packing the side-chains,we further optimized the selected rotamers to construct subrotamer with a gradient-based minimization procedure,which reasonably improved the discreteness of the rotamer library.The results of a classical test set confirmed that our parallel approach is competitive to other state-of-the-art solutions for packing side chains.The second research is the protein stability prediction based on the gradient boosting regression tree.Despite the high speed expansion of genetic data,the structural analysis is with high cost and low efficiency.a model for predicting the stability caused by point mutation based on low precision protein structure model is established.It's known that physicochemical properties and structural changes caused by protein mutation have an important effect on protein stability.We construct the three-dimensional structure of proteins by I-TASSER,and then obtain the structure of the mutant protein by reinserting the side chain.Based on these wild type and mutant type protein structures,we obtain the change of protein environment before and after the mutation.In order to describe the mutation environment more accurately,several different features from multiple sequence alignment,multiple template alignment and energy are introduced.Finally,a new regression model for stability change is constructed by the GBRT algorithm.Experiments on 5 independent data sets show that the optimal Pearson correlation coefficient is obtained when compared with other state-of-the-art predictor.The third research is the prediction of disease associated mutations through a novel machine-learning method BANN.A new structure-activity relational model is proposed for the complex relationship between protein mutation and function.Based on the Bayesian classification and artificial neural network,this model not only considers the statistical data which reduces the over-fitting phenomenon,but also draws a more accurate nonlinear relation which improves the accuracy and robustness of the forecast.As the current database involves a wide variety of species and their rules are different,we automated integrated the human data from the UniProt and PDB libraries,so that a one-to-one correspondence is quickly built for protein sequence,functional annotation and protein three-dimensional structure.In order to description the complex environment of mutant position,biological unit as the object of analysis was first introduced,thus some special parameters within the intramolecular and intermolecular can be analyzed.The two groups of experiments show that this method successfully optimized the classical Bayesian classification and artificial neural network algorithm and obtained the highest prediction accuracy when comparing with other predictors on multiple test sets.The major contributions of this dissertation include three aspects:In the side-chain prediction,the parallel meta-heuristic scheme based on SHOP mechanism is used to simulate the process of mechanism of protein folding in nature.Refinement of each residue improve the discreteness of the rotamer library so that we gain more accurate side chain conformations.In the structural parameter,a mapping is required between different kinds of libraries.In addition,some special structure knowledge(template-based,mutated structure-based,bioUnit-based,etc.)is introduced to detail the mutate location environment.In the construction of the structure-activity model,a novel machine-learning method integrate Bayes classification and artificial neutral network.With the guidance of probability density of each feature on different datasets,a better description for the complicated structure-activity relationship is shown.All the experiments show that these studies have played an important role in study on protein structure-activity relationship.The results and methods may have certain reference to following related studies.This dissertation designs a distributed online prediction platform.Recently two online services were offered: STRUM for mutant stability changes prediction and PreDAM for disease-associated mutation prediction.
Keywords/Search Tags:Protein, Side Chain, Single Point Mutation, Stability, Disease, Parallel, ACO, GBRT, Bayes, ANN, I-TASSER
PDF Full Text Request
Related items