Font Size: a A A

Feature Extraction Of Genome Sequence And Research On Phylogenetic Tree Method

Posted on:2012-12-21Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q YinFull Text:PDF
GTID:2230330395985619Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the completion of Human Genome Project(HGP), the genome sequences israpid increasing and more and more researchers pay attention to the phylogenetic treeof complete genomes. At present more and more research is focused on whole genomesequences for phylogenetic analysis, because it considers all biological features andcan be considered as the common goal of all creatures. Researching phylogeneticrelation from the angle of creatures’ genome can obtain more comprehensiveinformation about evolution. A common attitude in science is that phylogenetic treebased on complete genome is more closed than that based on species phylogenetic tree.However, researchers find that phylogenetic trees on different genes is not accordwith others. Because when making a phylogenetic analysis on complete genomes, thespecies evolution mode contains other evolution mode except vertical genetics.Consequently, the phylogenetic analysis based on genome sequences is verysignificant.In this paper, a statistic correlation feature is proposed and makes a similarityanalysis based on this statistic correlation feature for genome sequences, andconstruct a phylogenetic tree on genome sequences using fuzzy clustering algorithm.Firstly, in this paper, a new correlation feature (TBC) is proposed forphylogenetic analysis based on genome sequence according to the difference of jointprobability distribution in trinucleotide and base to show sequences’ difference. Datanormalization is used to TBC feature matrix and fuzzy similarity matrix is formed bythe exponent Chebyshev distance method. The phylogenetic tree is constructed usingtransitive closure of fuzzy clustering. The proposed method does not require multiplealignments and is simple in calculation. Phylogenetic trees of48Hepatitis E virus、24complete coronavirus、24complete transferrin and20mammal genomes respectivelyshow that our method is efficient.Secondly, a new fuzzy clustering method is proposed for constructingphylogenetic tree based on genome sequences. Use the above TBC statistic correlationfeature for constructing feature matrix and use split hierarchy clustering method forconstructing phylogenetic tree. In this split process, using fuzzy K-means algorithmclassify data object into two class, and this split process is circulated until the data object’s number is one. Phylogenetic trees of20complete mammal、24completecoronavirus、24complete transferrin and48Hepatitis E virus genome sequencesrespectively show that our method is efficient.
Keywords/Search Tags:genome, phylogenetic analysis, correlation feature, phylogenetic tree, fuzzy clustering
PDF Full Text Request
Related items