Font Size: a A A

The Genomes Recogniton Of Prokaryote

Posted on:2008-08-11Degree:MasterType:Thesis
Country:ChinaCandidate:Z J ShenFull Text:PDF
GTID:2120360245978407Subject:Biophysics
Abstract/Summary:PDF Full Text Request
Bioinformatics is defined as a scientific discipline that encompasses all aspects of biological information acquisition, processing, storage, distribution, analysis and interpretation that combines the tools and techniques of mathematics, computer science and biology with the aim of understanding the biological significance of a variety of data.This passage is devoted in analyzing the DNA sequence in order to find out the properties which can discriminate the code and non-code sequence of protein and then design the algorithm to improve the recognizing accuracy of the code gene of protein. In this passage we analysis the construction properties of gene and ORF in prokaryote, I try to fetch some parameters which can describe the properties of gene. We also analysis the overlapping gene and then design out the self-training algorithm to recognize the gene of prokaryote.In the first section we will have a general introduction of the background of Bioinformatics and basic knowledge of Biology which is concerned in this passage. The theory of Fisher Discrimination will be described clearly in the second section. The third and forth sections are the main parts of this passage, we will first look back briefly the work ancestors have done and then introduce my work in detail. The third section is the protein coding recognition. In this section we will explore the parameters as the gene is available and then design the algorithm to recognize the gene. I have found two groups of parameters, the first group is the unevenness of amino acids and the second group is the unevenness of bases and the transition probability. The fourth section is the gene-finding of prokaryote by self-training. In this section we will predict the gene on the self-training method as the sequence is available and I have found seven ways to select ORFs: 1. shorter discarded and longer maintained, 2. the multiply of information entropy, 3. the frequency of bases, 4. the variance of mismatch, 5. the average self-information of base (entropy), 6. the joint first and second position self-information, 7. the self-information of amino acids. During the process of testing,the recognition accuracy is improved by setting the scores of two groups of parameters as two new parameters.
Keywords/Search Tags:genomes of prokaryote, gene, overlapping gene, recognizing algorithm, gene recognition
PDF Full Text Request
Related items