Font Size: a A A

Studies On Gene Identification Algorithms And Analysis Of Genome Evolution

Posted on:2009-02-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:L Q ZhouFull Text:PDF
GTID:1100360245490801Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Currently, available biologic sequence data are increasing exponentially with the completion of human genomic project (HGP) and the coming of the post genome era. It becomes a very important task in the study of bioinformatics to analyze huge dataset and obtain the valuable information for human. We have worked on two problems. One is distinguishing the coding segments and non-coding segments in the whole genomes of prokaryotes and the gene segments and non-gene segments in the complete human genomes. Another is that analyzes the phylogeny of the vertebrate and polyomaviruses using the DNA and protein sequences of whole mitochondrial genomes and polyomaviruese genomes.The article is made up of four chapters. In the Introduction Chapter 1, the basic conceptions and study content of the bioinformatics and the organization of bioinformatics data are introduced. It also includes the widely used mathematics methods and software of gene identification and species phylogeny in bioinformatics. Chapter 2 is about the problem of distinguishing the coding segments and non-coding segments in the whole genomes. We establish mathematical models to deal with DNA sequence data using the theory and methods of fractals, statistics and information. We apply the existed algorithms and our newly proposed algorithms to distinguish coding and non-coding sequences in the genomes of prokaryotes and human. Our aim is to analyze the stability and high accuracy of these gene finding algorithms. Then try to find some new methods and ideas for gene finding problem. We used fractal method and Fourier method to distinguish the coding and non-coding sequences in the genomes of prokaryotes, the average distinguish accuracy of fractal method can reach 78.41%, while that of Fourier method can reach 86.58%. We also used multifractal (MF), regular tetrahedron (RT), Z curve (ZC) and global descriptor (GD) methods together to distinguish coding and non-coding sequences in human genome. The distinguish accuracy reach 83.74%.In Chapter 3, we introduce the mathematical models and methods to do the phylogenetic analysis. In Chapter 4, we use these methods to study the DNA sequences and protein sequences (include 64 vertebrate mitochondrial genomes and 70 parvovirus genomes), to construct the phylogenetic tree and analyze the evolutionary relationship among species. In the phylogenetic analyses of the 64 vertebrate mitochondrial genomes and 70 parvovirus genomes, we get the phylogenetic trees coincide with those obtained using traditional methods. Hence the phylogenetic models or methods proposed by us are reliable and stable. They are significant for the phylogenetic analysis.
Keywords/Search Tags:Bioinformatics, gene finding, phylogenetic analysis
PDF Full Text Request
Related items