Font Size: a A A

Developing Integrated Pipelines for Phylogeny and Evolutionary Analyses

Posted on:2013-07-14Degree:M.SType:Thesis
University:University of Nebraska at OmahaCandidate:Servisetti, SantoshFull Text:PDF
GTID:2450390008486048Subject:Biology
Abstract/Summary:
Molecular phylogeny is a fundamental tool for understanding the evolution and relationships of species and genes. Phylogenetic analysis involves several steps, including sequence input, sequence alignment, model test, and tree construction. These operations are commonly done separately and sequentially and require user's intervention at various stages. In addition, character-based phylogenetic methods (e.g., the Maximum likelihood method), although outperform other counterpart methods, are very time consuming, which may take weeks or even months to analyze a large dataset. The objectives of this thesis are twofold: 1) developing a pipeline that integrates all the intermediate methods necessary for phylogenetic analyses and generates user friendly phylogenetic trees automatically; 2) optimizing the workflow and phases in the pipeline, which will reduce the computational time.;In this thesis I investigate various tools related to alignment, model-test and phylogeny to check how reliable they are in generating results. I identified multiple tools for each of the stages in phylogenetic analysis and hosted them on the supercomputer servers. Once all the tools are identified they are connected one after another sequentially to form a pipeline. The input for this pipeline is a sequence file which is then processed through several phases to generate phylogenetic tree. Before a user uploads input file, several options are provided 1) which alignment tool should be used; 2) which phylogenetic tool should be used? as pipeline offers multiple tools for every stage. The alignment stage can be handled by either MUSCLE or MAFFT; Model testing can be processed either by Model-test or MrModel-test, and phylogenetic analyses can be performed using PAUP, GARLI, and MRBAYES. Whereas evolutionary dynamic parameters can be estimated using BEAST.;I developed a pipeline that links the above tools to perform phylogenetic analysis. Pipeline will automatically choose an optimal way to perform the analyses based on the phylogenetic tool selected by the user. In addition, I developed a separate approach for BEAST as it is very time consuming and require lots of computational resources to generate results. For large sets of data, even the supercomputer still takes several weeks or even months to generate results. I employed Divide and Conquer strategy to generate reasonable subsets of data and provided them as an input for BEAST. Once all the results are generated I took a mean of all the resultant logs and analyzed the performance of the approach.;Finally, I developed sequence extraction utility, which helps in dividing input dataset to subsets based on the total distinct years and total distinct location sequence information in the input dataset. This approach got good results and computational time reduced significantly.
Keywords/Search Tags:Pipeline, Phylogenetic, Phylogeny, Input, Results, Sequence, Analyses, Several
Related items