The Research Of Alignment-free Comparison Methods For DNA Sequences Based On Multiple K Values

Posted on:2020-08-10

Degree:Master

Type:Thesis

Country:China

Candidate:Y Zhang

Full Text:PDF

GTID:2370330596968153

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the development of the next generation gene sequencing technology,a large number of data have been generated in the field of biology.The processing of these biological data is an urgent problem to be solved,and it is also a major challenge faced by many other fields such as computer science and mathematics.Bioinformatics is generated in this context.The purpose of sequence comparison is to find out how similar the two DNA sequences are and then reveal the relationship between the corresponding species.In the past 50 years,a large number of sequence comparison methods have been proposed.At present,the main sequence comparison methods include two categories:alignment methods and alignment-free methods.The alignment methods often require a huge time cost,and require for fixed length of sequences.It cannot process large-scale data,and is no longer applicable in the current environment of data explosion.The alignment-free methods are usually to extract short sequence fragments of length k from the sequence and count some statistical features of the sequence fragments to define the sequence similarity.The alignment-free methods though to quickly get the sequence comparison result,but also faces two urgent problems.This kind of method relies on the parameter k to extract sequence features,therefore,k value has great influence on the performance of the algorithm.A large number of experiments are often needed to determine the optimal value of k,which brings difficulties to the practical application.In addition,the accuracy of the method still needs to be further improved.This paper aims to solve the two problems of the alignment-free method by giving a comprehensive consideration to multiple k values.This paper uses two weighting methods to distinguish the importance of features extracted from different k values and improve the accuracy of the alignment-free method.At the same time,this paper also introduces machine learning into the field of sequence comparison.It adopts machine learning model to deal with the problems related to sequence comparison.Based on these two ideas,this paper firstly improves the traditional alignment-free D₂-type method.While integrating multiple k,two different weighting schemes are applied:maximum deviation method and genetic algorithm.The weighted processing of sequence features improves the accuracy of the traditional D₂-type method.In this paper,two sequence comparison tasks are designed and implemented.The experimental results show that the proposed method can efficiently and accurately process large-scale biological DNA sequences without additional time complexity,and the experimental accuracy of our method is higher than that of the previous alignment-free methods.In addition,a machine learning model for sequence comparison is proposed.Multiple k values are still used to extract sequence features and carry out relevant coding.The convolutional neural network is used to process the sequence comparison task.Relevant experimental results show that compared with the previous alignment-freem methods,the sequence comparison model using convolutional neural network has a higher experimental accuracy.

Keywords/Search Tags:

DNA sequence comparison, maximizing deviation, genetic algorithm, convolutional neural network

PDF Full Text Request

Related items

1	Research On Retrieval Algorithm Of Satellite Microwave Remote Sensing Atmospheric Parameters Using Convolutional Neural Network
2	Alignment-free Methods For DNA Sequences Comparison And Their Applications
3	Study Of The Algorithm For ECG Beat Classification Based On Convolutional Neural Network
4	Genome-wide RNA-binding Proteins Identification Based On Evolutionary Deep Convolutional Neural Network
5	Research And Application Of Protein Sequence Ubiquitination Classification Algorithm Based On Convolutional Neural Network
6	Research On Jellyfish Detection Algorithm Based On Convolutional Neural Network
7	Research On The Construction Algorithm Of Gene Regulatory Network Based On Deep Neural Network
8	Research Of Sequence Specificities Based On Convolutional Neural Network
9	Research On Link Prediction Algorithm Based On Deep Convolutional Neural Network
10	Research On Molecular Classification Algorithm Based On Graph Neural Network