Font Size: a A A

New Symmetric Relative Entropy And Similarity Analysis For DNA Sequences

Posted on:2011-10-19Degree:MasterType:Thesis
Country:ChinaCandidate:J ShenFull Text:PDF
GTID:2120360305974580Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
The similarity analysis for DNA sequences is the key research content of Bioinformatics. It can be used to speculate the evolution relationship between sequences, construct the reasonable phylogenetic tree, and forecast the structure and the function of unknown sequence from the known sequence. This kind of study is not only the basic research, but applied research, which not only has great scientific significance, but also significant economic benefits. And in the similarity analysis of DNA sequences,non-sequence alignment is becoming important for studing gene sequences. So how to give a effective non-sequence alignment algorithm of gene sequences, to classify genes, and to study the phylogenetic relationships are the important problems in Bioinformatics.We briefly describe one of the algorithms of alignment-free, which is named the graphics research, and at the same time, on the distribution of K-tuple, we revise the distance of K-L and propose a new symmetric relative entropy according to the information theory and study its character. It not only has the symmetry, but can correctly describe the genetic relationship between species under the condition of missing word. Meanwhile it is used to study the following two facets:(1) To study the non-random distribution of DNA sequence of K-tuple. The results show that the revised relative entropy can accurately measurement the difference of distribution on K-tuple between the native DNA sequences and the corresponding randomized ones, which shows that the K-tuple distribution of DNA sequence is not random and presents order status in evolution. Further study indicates that, with the increasing of K, the difference approximately obey the exponential distribution. Making it logarithmic, we discover that the relations between species are closer, and the equation regression is bigger. These phenomenans show that, on the distribution of K-tuple, it is the best choice to use NSRE to study the genetic relationship of different species.(2) To reconstruct the phylogenetic tree. Applied to construct phylogenetic trees of the complete mitochondrial genomes of 26 species of placental mammals, with K increasing, it yields phylogenetic trees of which the classification effect increasingly matches the result widely recognised by the biological field and is optimal when K=6. The phylogenetic tree constructed by this method is more reasonable compared with that by euclidean distance, absolute distance, correlation coefficient, cosine distance and mutual information. We discover that the curves of some species occur crossing in the vicinity of K=6. This means that the distance value of NSRE using to descripe the relationship of species occurs opposite situation after K=6. The reson is the increasing of missing words has led to the decreasing of useful information and enlarging of error.
Keywords/Search Tags:similarity analysis, non-sequence alignment, the distribution of K-tuple, relative entropy, phylogenetic tree
PDF Full Text Request
Related items