Font Size: a A A

The Comparison Of Distance Metrics And Dissimilarities For Phylogenetic Analysis Using Complete Genomes

Posted on:2012-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:X S ChenFull Text:PDF
GTID:2210330338971814Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
In this paper we first introduce distance metrics and dissimilarities commonlyused in the alignment-free phylogenetic analysis methods. We discuss whether they areproper distance under strict mathematical sense. In fact, some"distances"we called arenot proper distance metrics under strict mathematical sense. Then we propose to usethree new dissimilarities in the alignment-free methods for phylogenetic analysis withrandom background. In the dynamic language approach, we propose a new similaritycoefficient to replace the cosine coefficient.Three genome data sets are employed to evaluate these methods from a biologicalpoint of view.From the biological point of view, the dissimilarities or distances based on therandom background used for phylogenetic analysis has some limitations. The processto eliminate noise background is necessary in the methods based on the compositionvectors without alignment which were developed for phylogenetic analysis using com-plete genomes. The phylogenetic trees constructed by Dice coefficient are not betterthan those by cosine coefficient. We also find that the process to eliminate noise back-ground is necessary in the dynamic language approach.
Keywords/Search Tags:complete genome, phylogenetic analysis, composition vector, distancemetric, dissimilarity
PDF Full Text Request
Related items