Font Size: a A A

Application Of Recurrent Iterated Function System To Bacterial Genome Analysis

Posted on:2009-05-12Degree:MasterType:Thesis
Country:ChinaCandidate:L ShiFull Text:PDF
GTID:2120360245990246Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
The main contents of this thesis consist of two parts. We model chaos game representation of genome sequences using recurrent iterated function system (RIFS) in the fist part and discuss the phylogenetic relationship of bacteria by the linked protein sequences from genomes in the second part.Since there are a number of complete genomes available in public databases, more and more researchers begin to study many kinds features or patterns of organisms from genome level. In 1990, Jeffrey [8] proposed the Chaos Game Representation (CGR) of DNA sequences. Then Fisher et. al. [15] extended the CGR of DNA sequences to the representation of protein sequences and protein structures. In 2004, Yu et. al. [20] first introduced CGR of linked protein sequences from genomes. In this thesis we study the chaos game representation of three kinds of sequences from genomes, namely the whole genome DNA sequences, the linked coding sequences from complete genomes and the linked protein sequences from complete genomes. Similarity and fractal structures are apparent in those CGR plots. In fractal geometry and its application, two problems are studied widely. One of the problems is what figure will be got from the given data sets and algorithm. Another problem is the inverse problem, i.e., how to get unknown parameters in the fractal model from the given data sets and figures. Here we consider the second problem. From the complete genome and their CGRs, we estimate the unknown parameters in RIFS model by moment methods and evaluate above simulation of the CGR of the these three kinds of data using the cumulative walk and goodness. We find that RIFS models the CGR of linked protein sequences from genomes very well.It is one important direction in genome sequence analysis to analyze structures and rules of genome information, find the conservative sequences associated with function and study the phylogenetic relationship of organisms by comparing genomes.. In this theisi, we discuss phylogenetic relationship of bacteria from following three aspects. Fist we find the estimated probability matrices in RIFS reflect some classification information of organisms, so we can discuss the phylogenetic relationship of bacteria using the Euclidean distance of probability matrices. Second, the correlation distance based on the difference of the original CGR measure and the RIFS simulated measure gives a more satisfactory phylogenetic tree. Third, the correlation distance based on the ratio of the original CGR measure and the RIFS simulated measure is also applied to construct the phylogenetic tree.
Keywords/Search Tags:genome sequence, chaos game representation, recurrent iterated function system, phylogenetic analysis
PDF Full Text Request
Related items