Font Size: a A A

Information Theory In Identification Of Signal Peptides

Posted on:2005-06-04Degree:MasterType:Thesis
Country:ChinaCandidate:L LiuFull Text:PDF
GTID:2120360182475900Subject:Theoretical Physics
Abstract/Summary:PDF Full Text Request
The main purpose of this paper is to introduce the characters of protein signalpeptide, then discuss and try to identify the signal peptide and its cleavage site withmathe. Some methods of this are also refered, which are thaught to be easier and morecorrect, then our method of information theory is introduced.The dataset of this paper is from CBS (Center for Biological Sequence Analysis)of Sweden, and was constructed by Dr Nielsen, et al. based on SWISS-PORT version29. This dataset was once used to develope a method for prediction for signal peptidesand their cleavage sites. All the data sets were homology reduced so that no twosequences were homologous within a set, which contains 1383 nonhomologous signalsequences and 519 nonhomologous mature protein sequences. The character of thesignal peptide, both eukaryotic and prokaryotic (including Gram+ and Gram-), isdescribed in this paper, and the conclusion approves the (-3,-1) rule again.Then we use information theory to disscuss the characters of protein signalpeptide and its cleavage site. First, average information rate is applied to matureprotein and signal peptide, and we draw a conclusion that the rate of mature proteinvaries with the changes of window width much less than that of signal peptide, whichindicates that we could use this to identify the cleavage site of signal peptide. Second,we use the conception of information entropy. Every position of signal peptide islooked as a sole source of information and has its own entropy: For eukaryotes, thereare two obvious canyons at the positions of -1 and -3, which indicates that these twopositions are more special, and the region of -12~-8 corresponds to a extremum,which indicates that h-region has more special characters than c-and n-region;whereas for Gram+ and Gram-bacteria, at the positions of -1 and -3, the canyons aredeeper, which tells us that signal peptides of Gram+ and Gram-bacteria obey (-1,-3)rule more. At last, simplified information-matrix is introduced and used to identify thecleavage site of signal peptide. Under the test method of leave-one-outcross-validation, the successful rate could reach 60.1%,69.2%,81.2% for eukaryotes,Gram+ and Gram-bacteria respectively.
Keywords/Search Tags:signal peptide, cleavage site, weight-matrix, (-1, -3) rule, information theory, information entropy, simplified information-matrix
PDF Full Text Request
Related items