Font Size: a A A

Construction Of A Protein Supersecondary Structure Database And Statistical Analysis Of Sequences

Posted on:2008-10-29Degree:MasterType:Thesis
Country:ChinaCandidate:C Y LiFull Text:PDF
GTID:2120360218462703Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
Protein is the basic material of biological activities, and almost all of biological activities were realized through the function of protein. Relation of function of proteins with their configurations is very much tight. The structure of proteins is the key to grasp the function of proteins.Now there are 3421677 protein primary sequences in Swiss-prot (8.7 release). While there are 38882 structures of protein in PDB data bank(2006.9.19). Proteins whose structure was mensurated by experiments are far less than that whose primary sequences were known. The main ways to discern protein structure by experiments are X-ray crystallography and NMR spectroscopy . But the process is extremely complex and the price is very high.Anfinsen presumed that the advance structure of the protein is determined by their primary sequences. So it is very important research issues of bioinformatics to predict protein constructions from their primary sequences sequences and to explore essential of biology molecule data.However, prediction of protein advance structure from its amino acid sequence directly is still very difficult, especially prediction of tertiary structure. Some results suggested that folds are mainly made up of a number of simple local units of super-secondary structural motifs. The information gained by the prediction of supersecondary structure can be used in tertiary structure prediction.It is simple to predict tertiary structure if we had known the conformation of protein simple supersecondary structural motif. So the prediction of the protein supersecondary from their primary sequences is very imporant for the prediction of the protein tertiary structures.In this paper, the main work is construction of a protein supersecondary structure datebase and statistical analysis of sequences.A representative set of 6819 protein were selected from the 1.69 release of SCOP. The selected proteins share <40% sequence identity. Secondary structure of each amino acid was presented by PDB. After analyzing and classifying protein sequences ,a set of 61824 supersecondary structures which contain five types supersecondary structure patternsα-α,α-β,β-α,β-βhairpin,β-βlink was presented. The supersecondary structure sequences were further classified according to the length of loops and a corresponding protein supersecondary structure database was set up. We also carried out the statistical analysis to amino acids from five types supersecondary sequences and had compared our results with correlation results. We had gotten a lot of useful information in supersecondary.At last, we have gotten better results with the application of Fisher criterion to classify two kinds of Strand-Loop-Strand sequence segment in protein supersecondary.
Keywords/Search Tags:Protein, Supersecondary structure, Datebase, Statistical analysis, Fisher criterion
PDF Full Text Request
Related items