Construction Of Statistical Potentials Of Local Structure-sequence And Prediction Of RNA-binding Sites

Posted on:2009-11-26

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Q Li

Full Text:PDF

GTID:1100360272462477

Subject:Bioinformatics

Abstract/Summary:

General and transferable statistical potentials to quantify the compatibility between local structures and local sequences of peptide fragments in proteins were derived. In the derivation, structure clusters of fragments are obtained by clustering five-residue fragments in native proteins based on their conformations represented by a local structure alphabet (de Brevern et al, Proteins, 2000, 41, 271-287), secondary structure states, and solvent accessibilities. Based on the native sequences of the structurally-clustered fragments, the probabilities of different amino-acid sequences were estimated for each structure cluster. From the sequence probabilities, statistical energies as a function of sequence for a given structure were directly derived. The same sequence probabilities were employed in a database-matching approach to derive statistical energies as a function of local structure for a given sequence. Compared with prior models of local statistical potentials, we provided an integrated approach in which local conformations and local environments are treated jointly, structures are treated in units of fragments instead of individual residues so that coupling between the conformations of adjacent residues is included, and strong interdependences between the conformations of overlapping or neighboring fragment units are also considered. In tests including fragment threading, pseudo-sequence design and local structure predictions, the potentials performed at least comparably, and in most cases better than a number of existing models applicable to the same contexts, indicating the advantages of such an integrated approach for deriving local potentials, and suggesting applicability of the statistical potentials derived here in sequence designs and structure predictions.The interactions between RNA-binding proteins (RBPs) with RNA play key roles in managing some of the cell's basic functions. Computational approaches are being developed to predict RNA-binding residues based on the sequence- or structure-derived features. To achieve higher prediction accuracy, improvements on current prediction methods are necessary.We identified that the structural neighbors of RNA-binding and non-RNA-binding residues have different amino acid compositions. Combining this structure-derived feature with evolutionary and other structure information significantly improves the predictions over existing methods. Using a multiple linear regression approach and 6-fold cross validation, our best model can achieve an overall correct rate of 87.8% and MCC of 0.47, with a specificity of 93.4%, correctly predict 52.4% of the RNA-binding residues for a dataset containing 107 non-homologous RNA-binding proteins. Compared with existing method, including the amino acid compositions of structure neighbors lead to clearly improved predictions.

Keywords/Search Tags:

bioinformatics, correlation of local structure-sequence, structural alphabet, statistical potential, protein structure prediction, sequence design, RNA-binding sites

Related items

1	Research On Prediction Of RNA And Protein Binding Sites Based On Sequence And Structural Information
2	Identifying protein-protein binding sites and binding partners using sequence and structure information
3	The Statistical Relationship Between MRNA Sequence, Structure, Energy And Protein Secondary Structure
4	Sequence-Based Predictors For Protein Binding Sites
5	A Study On The Protein Secondary Structure Prediction And The Connection Between Protein Secondary Structure And Its 3D Structure
6	Analysis And Prediction Of Protein Binding Sites Based On Structural Data
7	Research On Protein-protein Binding Sites Prediction Method Based On Sequence Information
8	Research On Protein-ligand Binding Sites Prediction Based On Sequence Information
9	Research On Protein-DNA Binding Sites Prediction Based On Sequence Information
10	Sequence-Based Prediction Of Proteingdp/GDP Binding Sites