| General and transferable statistical potentials to quantify the compatibility between local structures and local sequences of peptide fragments in proteins were derived. In the derivation, structure clusters of fragments are obtained by clustering five-residue fragments in native proteins based on their conformations represented by a local structure alphabet (de Brevern et al, Proteins, 2000, 41, 271-287), secondary structure states, and solvent accessibilities. Based on the native sequences of the structurally-clustered fragments, the probabilities of different amino-acid sequences were estimated for each structure cluster. From the sequence probabilities, statistical energies as a function of sequence for a given structure were directly derived. The same sequence probabilities were employed in a database-matching approach to derive statistical energies as a function of local structure for a given sequence. Compared with prior models of local statistical potentials, we provided an integrated approach in which local conformations and local environments are treated jointly, structures are treated in units of fragments instead of individual residues so that coupling between the conformations of adjacent residues is included, and strong interdependences between the conformations of overlapping or neighboring fragment units are also considered. In tests including fragment threading, pseudo-sequence design and local structure predictions, the potentials performed at least comparably, and in most cases better than a number of existing models applicable to the same contexts, indicating the advantages of such an integrated approach for deriving local potentials, and suggesting applicability of the statistical potentials derived here in sequence designs and structure predictions.The interactions between RNA-binding proteins (RBPs) with RNA play key roles in managing some of the cell's basic functions. Computational approaches are being developed to predict RNA-binding residues based on the sequence- or structure-derived features. To achieve higher prediction accuracy, improvements on current prediction methods are necessary.We identified that the structural neighbors of RNA-binding and non-RNA-binding residues have different amino acid compositions. Combining this structure-derived feature with evolutionary and other structure information significantly improves the predictions over existing methods. Using a multiple linear regression approach and 6-fold cross validation, our best model can achieve an overall correct rate of 87.8% and MCC of 0.47, with a specificity of 93.4%, correctly predict 52.4% of the RNA-binding residues for a dataset containing 107 non-homologous RNA-binding proteins. Compared with existing method, including the amino acid compositions of structure neighbors lead to clearly improved predictions. |