Font Size: a A A

Data analysis and knowledge discovery in molecular biology: The protein disorder prediction problem

Posted on:2000-07-05Degree:Ph.DType:Thesis
University:Washington State UniversityCandidate:Romero M., Pedro RafaelFull Text:PDF
GTID:2460390014961516Subject:Computer Science
Abstract/Summary:
The standard view in molecular biology is that a fixed three-dimensional structure is a prerequisite for protein biological function. However, literature in molecular biology contains numerous examples of proteins having long disordered regions (LDRs), that is, regions with no fixed three-dimensional structure, that are nevertheless involved in biological function. This prompted us to carry out research on the characteristics and commonness of such LDRs. Our study was complicated by the lack of structural information on proteins in general and on disordered proteins in particular, and by the inability of current structural determination methods to handle disordered regions.;One of the most fundamental tenants of molecular biology is that amino acid sequence determines protein structure. This is well established for ordered protein structure. We reasoned that amino acid sequence should determine disorder as well. To test our hypothesis that disorder is also encoded by the amino acid sequence, we investigated whether it is possible to predict disorder based on sequence information. Neural network predictors (NNPs) were trained on ordered and disordered regions found in protein structural databases. The finding that these NNPs predict order/disorder with an accuracy well above that expected by chance convincingly proves that disorder, like other types of protein structure, is encoded by the amino acid sequence. Predictions performed on major protein sequence databases by these NNPs suggest that LDRs comprise an important fraction of all protein sequences in nature.;Disordered data was partitioned into different protein families including only closely related proteins. Family-specific NNPs developed from such data predicted disorder very differently, implying the existence of different types of disorder.;The complexity of protein sequences has been related to their structural characteristics. Studies on the relationship between sequence complexity and protein disorder suggest that sequence complexity could also be related to differences between types of disorder.;All this research implies that protein disorder is a complex and varied phenomenon, deserving to be considered a new category of protein structure.
Keywords/Search Tags:Protein, Disorder, Molecular biology, Structure, Amino acid sequence, Data
Related items