Font Size: a A A

The Research Of Alignment-free Comparison Methods For DNA Sequences Based On Multiple K Values

Posted on:2020-08-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2370330596968153Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of the next generation gene sequencing technology,a large number of data have been generated in the field of biology.The processing of these biological data is an urgent problem to be solved,and it is also a major challenge faced by many other fields such as computer science and mathematics.Bioinformatics is generated in this context.The purpose of sequence comparison is to find out how similar the two DNA sequences are and then reveal the relationship between the corresponding species.In the past 50 years,a large number of sequence comparison methods have been proposed.At present,the main sequence comparison methods include two categories:alignment methods and alignment-free methods.The alignment methods often require a huge time cost,and require for fixed length of sequences.It cannot process large-scale data,and is no longer applicable in the current environment of data explosion.The alignment-free methods are usually to extract short sequence fragments of length k from the sequence and count some statistical features of the sequence fragments to define the sequence similarity.The alignment-free methods though to quickly get the sequence comparison result,but also faces two urgent problems.This kind of method relies on the parameter k to extract sequence features,therefore,k value has great influence on the performance of the algorithm.A large number of experiments are often needed to determine the optimal value of k,which brings difficulties to the practical application.In addition,the accuracy of the method still needs to be further improved.This paper aims to solve the two problems of the alignment-free method by giving a comprehensive consideration to multiple k values.This paper uses two weighting methods to distinguish the importance of features extracted from different k values and improve the accuracy of the alignment-free method.At the same time,this paper also introduces machine learning into the field of sequence comparison.It adopts machine learning model to deal with the problems related to sequence comparison.Based on these two ideas,this paper firstly improves the traditional alignment-free D2-type method.While integrating multiple k,two different weighting schemes are applied:maximum deviation method and genetic algorithm.The weighted processing of sequence features improves the accuracy of the traditional D2-type method.In this paper,two sequence comparison tasks are designed and implemented.The experimental results show that the proposed method can efficiently and accurately process large-scale biological DNA sequences without additional time complexity,and the experimental accuracy of our method is higher than that of the previous alignment-free methods.In addition,a machine learning model for sequence comparison is proposed.Multiple k values are still used to extract sequence features and carry out relevant coding.The convolutional neural network is used to process the sequence comparison task.Relevant experimental results show that compared with the previous alignment-freem methods,the sequence comparison model using convolutional neural network has a higher experimental accuracy.
Keywords/Search Tags:DNA sequence comparison, maximizing deviation, genetic algorithm, convolutional neural network
PDF Full Text Request
Related items