Font Size: a A A

Phylogenetic Tree Analysis Of DNA Sequences Based On ?-mer

Posted on:2020-07-25Degree:MasterType:Thesis
Country:ChinaCandidate:J H ZhangFull Text:PDF
GTID:2370330572478472Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
At present,a lot of previous studies have been conducted on the classification methods of biological big data.The huge time and space complexity caused by the traditional Alignment method in the study of biological sequences,as well as the selection of multiple scoring matrices in the calculation process,all pose difficulties in the comparison of sequences.In order to overcome the above shortcomings of Alignment methods,researchers have made a lot of contributions.The emergence of Alignment-free methods solves some problems of sequence comparison in a lot of specific biological genome data sets,but the study of biological sequences by Alignment-free methods relies on the research data to a certain extent.Currently,the number of sequences in the test data set of Alignment-free methods is generally small.Currently,the number of sequences in the test data set of Alignment-free methods is generally small.Taking ?-word as the research object,this paper proposes a Alignment-free method for DNA sequence analysis of large data sets,which specifically refers to the data set with more than 200 sequences studied.In the comparative study of mammalian whole genome mitochondrial DNA sequences,this classification method can get good evolutionary classification results.The ?-word information of biological sequences is a very important sequence feature in many biological sequence analysis and processing methods.In this paper,the number of euler loops is used to limit the unique k value generated by the sequence.For a given length k,this paper studies the quantitative relationship between two sequences of ?-words.A similarity measure consisting of the same degree of ?-words between sequences is proposed.This metric reveals the overall nature of the ?-word in the DNA sequence.In this paper,mitochondrial DNA sequences of 31,70 and 236 mammals were generated into evolutionary trees,which were consistent with the standard biological classification.Our method is better than the other three methods using 31 data sets.The method presented in this paper is superior to previous methods using this 70 data set.For the dataset of 236 DNA sequences,there is no Alignment-free method to test it,and our method yields better results than the Alignment method.Time complexity and space complexity are greatly reduced.
Keywords/Search Tags:?-mer, Sequence comparison, Similarity measure, Phylogenetic tree, Alimment-free
PDF Full Text Request
Related items