Font Size: a A A

Construction Of DNA Sequence Phylogenetic Tree Based On Suffix Tree

Posted on:2020-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhangFull Text:PDF
GTID:2370330572978472Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Biological sequence comparison is a rapidly developing discipline in recent years.It is mainly used to deal with the huge data generated by the rapid development of molecular biology.There are two common methods for biological sequence comparison: sequence alignment and sequence alignment-free.However,because the whole genome sequence of organisms is relatively long and the calculation of alignment method is too heavy,it is not feasible for us to directly analyze the similarity between sequences by using sequence alignment method in some cases.Sequence alignment-free is not a specific comparative nucleotide,but a method that regards a sequence as a whole and transforms it into a mathematical object.Finally,it is analyzed and compared by means of mathematical tools.In this paper,we use the alignment-free method to study the similarity of biological sequences.The suffix tree model is used to store the suffix identifier at each location in the sequence.Its proposal provides an efficient guarantee for many aspects of research.Scholars around the world in many fields have been engaged in research on the practical application of suffix tree model.The suffix tree model also has important applications in biological sequence comparison.For example,Leimeister CA and others used suffix tree model to find the location of the longest common substring to approximate the length of the longest common substring under k-mismatch.In this paper,two new dissimilarity measures are proposed based on suffix tree model.The first dissimilarity measure is based on the corresponding position set of each suffix identifier's set in the sequence.The intersection of suffix identifier's sets of two biological sequences,and the position set corresponding to all suffixes in the intersection set is merged.The ratio of the number of positions to the length of the sequence is found in the union of each sequence,and the larger one of the ratios is subtracted by 1.The second measure of dissimilarity is to take the smaller value of the length of two sequences and find the number of common and unique suffixes between the two sequences based on the suffix tree model,then divide the difference by the smaller value in length.After testing,the proposed methods can reconstruct 12 primate biological sequences,31 mammalian mitochondrial sequences and 48 hepatitis E virus sequences,respectively.The phylogenetic tree diagrams obtained by these methods are in accordance with current biological classification,and the results of three data sets reconstructed by two new dissimilarity measures are better than those obtained by other methods published in the literature.0r the results of tree transformation are in good agreement with those of other methods used in the literature to reconstruct the phylogenetic tree from three data sets.
Keywords/Search Tags:Suffix Tree, Alignment-free, Dissimilarity Measure, Phylogenetic Tree
PDF Full Text Request
Related items