Font Size: a A A

Analysis And Construction Of Small Proteins Database

Posted on:2014-07-19Degree:MasterType:Thesis
Country:ChinaCandidate:C DaiFull Text:PDF
GTID:2250330425953288Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Small proteins (polypeptides containing no more than100amino acids) exist broadly in three kingdoms of life. They play important roles in various biological metabolism and evolutionary process, which involve a number of significant function classes, for example, energy metabolism, proteolipids, chaperonins, stress proteins, transporters, transcriptional regulators, nucleases, ribosomal protein and metal ion chelators. As an essential member in protein family, small proteins undertake an indispensable biological function in living organisms and the evolutionary process. Therefore, analysis on the characteristics of small proteins is an important part in bioinformatics study. Although there are kinds of databases that involve plenty of genomic and proteomic data at present, such as SWISS-PROT, PDB, SCOP, Pfam and so on, few of them integrate the sequence information and characteristics of all small proteins. Our research in the thesis will fill in the blank in this respect.We downloaded the data of2078species in Archaea, Bacteria and Eukaryota from NCBI genomes(ftp://ftp.ncbi.nih.gov/genomes/). We collected about760,000small proteins and picked up the key information by Perl programs, such as, GI number, accession number, protein sequences, function, classified information. Then we did a further analysis of conservation and homology on those proteins by statistical methods. The result shows that, proteins with a length between60-100amino acids have a large proportion and small protein number correlate positively with the length of small proteins. The conservation of small proteins has a close relationship with protein length, and small proteins with longer sequences tend to be high conserved. Such conservation in higher organisms is higher than that in lower organisms.We constructed a special database including the sequences and characteristics of small proteins by handling data. The database provides a very clear friendly web interface and features the basic functions of search and download. Users can view the small protein primary sequence information and protein characteristics. The database also provides the results of statistical analysis, conservative and homology clustering. It is a necessary technical support and data integration for lots of researchers who engage in the study of small proteins.
Keywords/Search Tags:small proteins, conservation, homology, protein function, database
PDF Full Text Request
Related items