Font Size: a A A

Application Of Inter Distances And Related Methods In Molecular Phylogeny And Metagenome

Posted on:2018-07-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:X H XieFull Text:PDF
GTID:1310330518978582Subject:Statistics
Abstract/Summary:PDF Full Text Request
Traditional methods for constructing phylogenetic tree were dependent on sequence alignment.But there were many disadvantages for sequence alignment: the selection of common genes is arbitrary in some certain;Scoring matrix for nucletide acids and amino acids have no agreed standard;sequence alignment for distant genetic related sequence may be invalid;alignment algorithms are time consuming,especially for multiple sequence alignment(Up to now,algorithms for finding optimal alignment from multiple sequence alignment are still NP-hard problem).In the genomic era,it is hoped that the phylogenetic tree can be reconstructed by using the whole genome sequence information.Inter-nucletide distance is a numeric representing method.Inspired by the idea of Inter-nucletide distance,we propose the concepts of inter-amino acids distance and novel inter-nucleotide acids distance,and apply them to the research of constructing phylogenetic tree and metagenomic,respectively.In detail,the content of this thesis includes:First,we give the definition of inter-amino acids distance and apply it to the reconstructing phylogenetic tree.We focus on the study of the alignment-free phylogenetic analysis using whole-proteome sequences.Based on the inter-amino-acid distances,we first convert the whole-proteome sequences into inter-amino-acid distance vectors,which are called observed inter-amino-acid distance profiles.Then,we propose to use conditional geometric distribution profiles(the distributions of sequences where the amino acids are placed randomly and independently)as the reference distribution profiles.Last the relative deviation between the observed and reference distribution profiles is used to define a simple metric that reflects the phylogenetic relationships between whole-proteome sequences of different organisms.We name our method inter-amino-acid distances and conditional geometric distribution profiles(IAGDP).We evaluate our method on two data sets: one benchmark dataset including 29 genomes used in previous published papers,and another one including 67 mammal genomes.Our results demonstrate that the new method is useful and efficient.Second,we generalize the concept of inter-amino acids distance to nucleotide sequence.We propose a novel inter-nuceotide distance of DNA sequence and apply it to the visualization study of metagenomic data.We first convert the fragment sequences into inter-nucleotide distances profiles.Then,we analyze these profiles by principal component analysis.Last the principal components are used to obtain the 2-D scattered plot according to their source of species.We name our method inter-nucleotide distances profiles(INP).Our method is evaluated on three benchmark data sets which used in previous published papers: Data set 1 includes 5 genomes,Data set 2 includes 8 genomes and Data set 3 includes 10 genomes.Our results demonstrate that the new characteristic extraction method can extract the features from the DNA sequence in a simple but effective way and almost free of the parameter selection.Therefore,it provides a good,alternative and efficient way to visualize the metagenomic data.
Keywords/Search Tags:phylogenetic tree, alignment-free method inter-nucleotidedistance, inter-amino acids distance, conditional geometric distribution
PDF Full Text Request
Related items