Font Size: a A A

Fractal Characteristics Of DNA Sequence Based On Chaos Game Representation

Posted on:2009-08-09Degree:MasterType:Thesis
Country:ChinaCandidate:F XueFull Text:PDF
GTID:2120360272975463Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the development of bioinformatics and the rapid accumulation of genomics data, life science has entered the post-genome era, researches are gradually focused on the function of genes. Fractal is a subdiscipline of nonlinear area, researches on fractal characteristics of the deoxyribonucleic acid (DNA) sequence may reveal some secret hidden in the DNA sequence during the process of biological evolution. Chaos Game Representation (CGR) was proposed as a scale-independent representation for genomic sequences by Jeffrey in 1990. This technique, based on iteration function system, represents the word distribution statistics in DNA sequence as the fractal graphic. Therefore, we can get the distribution of sequence through analysis of fractal theory. CGR method has become a statistical method of DNA sequence analysis.Based on CGR of DNA sequence, relatively systematic researches on fractal characteristics of the DNA sequence have been done in this paper. Main conclusions are as follows:First, from the frequency matrix of CGR, frequency distribution are analyzed on the n-word subsequences, from which it is found that the number of the n-word subsequences whose occurrence frequency is only one change with n value in the same mode for the roughly same length of the sequences; and the structure of DNA sequence is discussed, it is found that the relationship between the maximal frequency of n-word subsequences and n value is consistent among many species, and so is the relationship between the number of the n-word subsequences whose occurrence frequency is only one and n value.Secondly, the iterated function system of CGR graph of DNA sequence is discussed. Examples of different sequences in different shrinkage factor are compared. The conclusions are: when contraction coefficient is larger ( k = 0.999), similar sequences will shrink to little graphics with similarity, but small contraction graphics are quite different for random DNA sequences. Subsequently, based on the CGR graphics, R/S analysis of the DNA sequences confirmed that the DNA sequence does have the long range correlation.Then, this paper proposed a calculation method of fractal information dimension of the DNA sequence based on the CGR. Experimental results showed that for the same species the information dimension of coding sequence (CDS) is bigger than that of the non-coding sequences. Subsequently, a calculation method and its application in the similarity of the DNA sequence was proposed. On the basis with absolute difference as the measurement standard,this method was used to comparing the similarity of three groups of DNA sequences with different characteristics. Experimental results showed that, different species in the same organization genomic sequences and different fragments of the same genome sequence are of higher similarity in each group.Finally, the calculation process of CGR graphic's multi-fractal analysis was studied. the scope of choice of the weight factor and the scale invariance range were discussed, the weight factor can be chosen as ? 15≤q≤50; multi-fractal spectrum and generalized dimension of different sequence were calculated, multi-fractal spectrum and generalized dimension in different scales of different sequences were compared, it was found that multi-fractal spectrum and generalized dimension can show different levels of fractal characteristics of CGR graphics of DNA sequence, which can distinguish more complex structure of sequences.
Keywords/Search Tags:DNA sequence, chaos game representation, fractal characteristics, similarity analysis, multi-fractal
PDF Full Text Request
Related items