Font Size: a A A

Research On Phylogenetic Network Construction Algorithms

Posted on:2015-10-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:J WangFull Text:PDF
GTID:1220330422992483Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Phylogenetic analysis is one of the most important topics in bioinformatics. For aset of species, molecular phylogenetic analysis is by analyzing the molecular sequencesof the species, and then constructing a phylogenetic tree or a phylogenetic network forrevealing their evolutionary relationships. Phylogenetic networks are a generalization ofphylogenetic trees. Phylogenetic networks are able to describe reticulate evolutionaryevents (e.g. recombination) that species occurred during evolution, and are able to rep-resent conflicting evolutionary information implied phylogenetic trees. Therefore, howto construct phylogenetic networks is an important research field. In order to study thephylogenetic network construction algorithms, this paper first studies the distance com-pute methods between sequences, then researches the phylogenetic tree construction al-gorithms based on distances and the definition of metrics on the space of phylogeneticnetworks, finally focuses on the phylogenetic network construction algorithms based onrooted phylogenetic trees.The contributions of the dissertation are as follows:(1) A distance compute method, JCV, is proposed for DNA sequences.Given the DNA sequences for a group of species, JCV first codes the DNA se-quences of each species into a characteristic vector, and then computes the distance be-tween species based on these characteristic vectors. The distance matrix obtained can beused to construct phylogenetic trees for phylogenetic analysis. JCV bypasses the com-plexity of performing multiple sequence alignments, so it can compute the distance be-tween DNA sequences with any length and any size. Moreover, the phylogenetic analysisbased on JCV method avoids the ambiguity of building species tree based on a singlegene.(2) A method, FastJoin, based on distances is proposed for constructing phylogenetictrees.Neighbor-Joining (NJ) is a phylogenetic tree construction method based on distancesthat, thanks to its good accuracy and speed, is currently widely used. NJ method is basedon the theory that, for a purely additive distance matrix as input, the two taxa i0and j0,corresponding with the minimum value Si0j0of the sum matrix S computed from the distance matrix, are a pair of true neighbors. Therefore, NJ constructs phylogenetic treesby iteratively picking a pair of taxa to merge as a new taxon until the number of remainingtaxa is≤3. Through an intensive study on NJ, its theory is extended as that, for a purelyadditive distance matrix as input, there is another pair of true neighbors in the set oftaxa, that is, the two taxa corresponding with the minimum value of members that remainafter removing the members in i0th and j0th rows and columns from S, where Si0j0isthe minimum value of S. Based on this theory, an improved NJ algorithm is proposed.Therefore, the improved NJ algorithm constructs phylogenetic trees by iteratively pickingtwo pair of taxa to merge as two new taxa respectively until the number of remaining taxais≤3. By combining the improved NJ algorithm with the search strategy of RapidNJand the external storage style of ERapidNJ, we create a novel method for constructingphylogenetic trees——FastJoin. The experiments prove that FastJoin is very efcient,especially for large data.(3) A metric is defined on the space of partly reduced phylogenetic networks.The evolutionary history of species is traditionally represented with a rooted phylo-genetic tree. When rooted phylogenetic trees are built from several diferent datasets (e.g.from diferent genes), the evolution information implied by those trees is often conflicting.The conflicting information cannot be expressed as a simple phylogenetic tree; however,they can be expressed as a phylogenetic network. In the process of constructing phyloge-netic networks, one needs to compute the distance between phylogenetic networks, suchas the distance between the constructed networks and the simulation networks or the re-al networks. Several measures for quantifying the topological dissimilarity between twophylogenetic networks have been devised, each of which was proven to be a metric ona certain restricted classes of phylogenetic networks. This thesis has defined the spaceof partly reduced phylogenetic networks which contains all phylogenetic networks thathave defined metrics, and defined a metric on the space of partly reduced phylogeneticnetworks.(4) Two methods, LNETWORK and BIMLR, based on rooted phylogenetic trees, areproposed for constructing phylogenetic networks.There is currently a large body of research aimed at developing appropriate methodsfor constructing phylogenetic networks from rooted phylogenetic tree sets. CASS is anefcient algorithm for constructing phylogenetic networks, and the networks constructedby CASS are much simpler than other methods. But it is extremely slow for large datasets or for datasets that need lots of reticulate nodes. Moreover, the networks constructedby CASS are also greatly dependent on the order of input data, i.e. it generally derivesdiferent phylogenetic networks for the same dataset when diferent input orders are used.This thesis first defines the removed taxa based on seed-growing algorithm, and thenproposes two improved CASS algorithms, LNETWORK and BIMLR, based on removed taxaand incompatibility taxa. Experiments show that LNETWORK and BIMLR are significantlyfaster than CASS and efectively weakens the influence of input data order. Moreover, thenetworks constructed by BIMLR and LNETWORK are simpler than other methods, and canreflect the evolutionary information of input data better than other methods。...
Keywords/Search Tags:phylogenetic tree, phlyogenetic network, neighbor-joining, DNA sequence, metric
PDF Full Text Request
Related items