Font Size: a A A

Methodology Studies And Applications Of Pepetide Structural Characterization And Statistical Modeling

Posted on:2012-12-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:F F TianFull Text:PDF
GTID:1114330338996642Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Served as one of the most important bioactive substances, peptide plays a central role in many physiological and biochemical processes. Use of statistical methodology to investigate the structure-function relationship for peptides is first in face of two aspects, i.e. structural characterization and statistical modeling (SM). In this dissertation, we propose, improve, introduce, and compare a series of SC and methods in a framework of their applications in statistical analysis of the biological activity and physicochemical property of peptides. First, structural characterization was performed to parameterize peptides at three levels: (i) Unit parameterization level: divided physicochemical property scores (DPPS) and topological scale (T scale) are defined for numerous coded and noncoded amino acids to describe the feature of residue units constituting peptides and peptide analogues. (ii) Sequence parameterization level: amino acid composition descriptors/environment influence descriptors (ACD/EID) are proposed to characterize residue assignment and their interactives in a peptide sequence. (iii) Structure parameterization level: side-chain conformational space analysis (SCSA) and quantum mechanics/molecular mechanics-Poisson–Boltzmann/surface area (QM/MM-PB/SA) scheme are designed based, separatey, on the combination of self-consistent mean field theory and rotamer library and on the coupling of direct nonbonding energy calculation with indirect desolvation effect analysis to dissect the free energy profile of peptide ligand binding to protein receptor on the basis of their complex structure models. Secondly, we investigate the SM methods for modeling peptides as follows: (i) Application of novel methods: Gaussian process (GP) and random forest (RF) are introduced into peptide statistical modeling, and their performances are also examined in detail with compared to traditional methods. (ii) Proposition of new methods: genetic algorithm (GA) is employed to perform variable selection for GP modeling, leading to GA/GP; immune algorithm (IA) and neural network (NN) are combined together, resulting in immune neural network (INN). (iii) Comparison of different methods: we compared eight different modeling methods systemically in aspects of their statistical performance. (iv) Dataset splitting methods: the Monte Carlo sampling-based protocol called SpScore is constructed to maximumly balance the contradiction between the internal diversity and external similarity of both the training and test sets. (v) Software development: various statistical modeling methods are integrated into a single Matlab package named ZP-explore to facilitate the usage, analysis, and validation of these methods in a uniform environment.The T scale coupled with partial least squares (PLS) regression, support vector machine (SVM), and immune neural network (INN) is used to perform quantitative sequence–activity model (QSAM) study of angiotensin-converting enzyme (ACE) inhibitors and elastase substrates. Results indicate that the biological activity of ACE-inhibitory dipeptides closely relates to molecular topological property, and the size of 2nd residue is prominent to peptide inhibitory potency. In addition, the relationship between structural feature and catalytic kinetic property of elastase substrates seems to be quite complicated, which is primarily governed by the quadratic terms of residues and the interactive terms between residues.The DPPS and SCSA are used to characterize nonbonding networks across the binding interface of human HLA*A-0201 protein–antigen nonapeptide complexes, and the resulting variables with the both methods are confirmed to be efficient when performing statistical modeling of structure–activity relationship for peptide ligands. In analysis of the constructed models, it is concluded (i) hydrophobicity and hydrogen bond are the most important to the binding, second by the electrostatics. While the steric property contributes very little to the binding; (ii) the anchor residues P2 and P9 are the most important sites for peptide ligands, second by the P1, P3 and P7, and the P4, P5, P6 and P8 only play insignificant role. In addition, the previously neglected entropy is also demonstrated to be essential in the binding of peptide to HLA*A-0201.QM/MM-PB/SA scheme is designed to analyze the complex structures of OppA protein with its peptide ligands. Energy decomposition, site comparison, and statistical modeling give a preliminary insight into the molecular mechanism of the broad specificity of OppA recognizing diverse peptides: (i) the peptide backbone and its N-terminal residue confer significant stability but little specificity for the OppA–peptide binding. (ii) The contribution to the specific binding by the desolvation effects occurring at the capricious side-chains of central residues is compensated largely due to the presence of the voluminous, adaptable hydrated pocket. (iii) The bulky central residues would incur intensive steric collisions with neighboring OppA atoms. This unfavorableness could be partially compensated by the favorable desolvation effect in case the bulky residue is hydrophobic (nonpolar) or in a favorable long-range electrostatic attraction (in this case, the bulky residue should be charged).GA/GP modeling and SpScore splitting are utilized to analyze the binding behavior of decapeptide ligands to human amphiphysin SH3 domain. By this way, the model quantitatively predicting the binding affinity is constructed reliably and analyzed in details. The determined GP hyperparameters indicate that (i) the SH3 domain-binding peptide system involves both linear and nonlinear dependences, and the nonlinear aspect dominates over linear facet. (ii) Diverse properties contribute to the interactions between the SH3 domain and its peptide ligands. Particularly, steric property and hydrophobicity of P2, electronic property of P0, and electronic property and hydrogen bond property of P-3 in the decapeptide sequence exert a significant effect on the binding affinity of SH3 domain–peptide complexes.Based upon GA-variable selection, the retention behavior of a panel of histidine-rich peptides in immobilized metal-affinity chromatographic (IMAC) column is simulated with several machine learning (ML) methods. The obtained results demonstrate that GA can substantially improve the performance and statistical quality for these ML methods. The optimal GA/GP model reveals that several physicochemical properties influence the retention behavior of peptides in IMAC. In particular, coordination interaction, electrostatic factor, solvent effect, and hydrogen bond correlate significantly with the retention ability of peptides.Based upon ACD/EID descriptor system, eight kinds of statistical regression methods are used to predict the normalized retention time (NRT) of peptides generated from E. coil proteome. Furthermore, these methods are compared comprehensively on their various performances, including fitting ability, stability, predictive power, unbiasedness, interpretability, and efficiency. The results show that Gaussian process (GP) and back-propagation neural network (BPNN) possess the best stability, unbiasedness, and predictive power, and they can accurately model the structure–retention relationships for peptides. Multiple linear regression (MLR) and partial least squares regression (PLS) performed inferior to nonlinear modeling techniques, but being computationally efficient, they are promising to treat with qualitative problems involving massive data. In addition, descriptor importance in different models is also investigated, and we find that the amino acid composition exhibits a significant linear correlation with the peptide retention time, whereas the residue environment is mainly in a nonlinear correlation with peptide retention.
Keywords/Search Tags:peptide, structural characterization, statistical modeling, machine learning
PDF Full Text Request
Related items