Construction Of DNA Sequence Phylogenetic Tree Based On Suffix Tree

Posted on:2020-04-24

Degree:Master

Type:Thesis

Country:China

Candidate:X Zhang

Full Text:PDF

GTID:2370330572978472

Subject:Applied Mathematics

Abstract/Summary:

PDF Full Text Request

Biological sequence comparison is a rapidly developing discipline in recent years.It is mainly used to deal with the huge data generated by the rapid development of molecular biology.There are two common methods for biological sequence comparison: sequence alignment and sequence alignment-free.However,because the whole genome sequence of organisms is relatively long and the calculation of alignment method is too heavy,it is not feasible for us to directly analyze the similarity between sequences by using sequence alignment method in some cases.Sequence alignment-free is not a specific comparative nucleotide,but a method that regards a sequence as a whole and transforms it into a mathematical object.Finally,it is analyzed and compared by means of mathematical tools.In this paper,we use the alignment-free method to study the similarity of biological sequences.The suffix tree model is used to store the suffix identifier at each location in the sequence.Its proposal provides an efficient guarantee for many aspects of research.Scholars around the world in many fields have been engaged in research on the practical application of suffix tree model.The suffix tree model also has important applications in biological sequence comparison.For example,Leimeister CA and others used suffix tree model to find the location of the longest common substring to approximate the length of the longest common substring under k-mismatch.In this paper,two new dissimilarity measures are proposed based on suffix tree model.The first dissimilarity measure is based on the corresponding position set of each suffix identifier's set in the sequence.The intersection of suffix identifier's sets of two biological sequences,and the position set corresponding to all suffixes in the intersection set is merged.The ratio of the number of positions to the length of the sequence is found in the union of each sequence,and the larger one of the ratios is subtracted by 1.The second measure of dissimilarity is to take the smaller value of the length of two sequences and find the number of common and unique suffixes between the two sequences based on the suffix tree model,then divide the difference by the smaller value in length.After testing,the proposed methods can reconstruct 12 primate biological sequences,31 mammalian mitochondrial sequences and 48 hepatitis E virus sequences,respectively.The phylogenetic tree diagrams obtained by these methods are in accordance with current biological classification,and the results of three data sets reconstructed by two new dissimilarity measures are better than those obtained by other methods published in the literature.0r the results of tree transformation are in good agreement with those of other methods used in the literature to reconstruct the phylogenetic tree from three data sets.

Keywords/Search Tags:

Suffix Tree, Alignment-free, Dissimilarity Measure, Phylogenetic Tree

PDF Full Text Request

Related items

1	Studied On Gene Sequence Alignment Based On Mixed Suffix Tree And Suffix Array
2	Multiple Sequence Alignment. Bioinformatics Algorithm
3	The Design And Implementation Of A Multiple Sequence Alignment Algorithm Based On Suffix Tree Strategy
4	Phylogenetic Tree Analysis Of DNA Sequences Based On ?-mer
5	Construction Phylogenetic Tree From DNA Sequences Based On The Positions Of Common Prefix Identifiers
6	Ultra-large Multiple Sequence Alignment Based On Distributed Computing
7	LM-Suffix: Research On Gene Sequence Index Structure Based On Suffix Tree
8	Alignment-free Sequence Similarity Analysis And Clustering Algorithms On Biological Sequences
9	Bioinformatics Multiple Sequence Alignment And Phylogenetic Spanning Tree Of Several Techniques And Algorithms
10	The Alignment-free Methods And Their Applications For Analysis Of Biological Sequences