Font Size: a A A

A Hybrid Approach For Accurate Short Read Clustering And Barcoded Sample Demultiplexing In Nanopore Sequencing

Posted on:2023-10-17Degree:MasterType:Thesis
Country:ChinaCandidate:J H QiFull Text:PDF
GTID:2568306617969609Subject:Financial mathematics and financial engineering
Abstract/Summary:PDF Full Text Request
In recent years,the application of the Oxford nanopore sequencing technology greatly promoted the development of life science,the sequencing process is easy to understand,to begin with,double-stranded DNA unravel into A single,single DNA through the nanopore,stay single DNA molecules in the channel,some changes in current ion,different bases(A,T,C,G)can bring different current changes.Then,a current map can be obtained.Finally,some pattern recognition algorithms can be used to translate the current map into base sequence.In simple terms,Oxford nanopore sequencing can be understood as the conversion of electrical current to base sequence.Based on the Oxford nanopore sequencing technology,there are two options for singlecell sequencing.One approach uses biological methods to isolate individual cells,build sequencing libraries,and then sequence them.This scheme has high cost and low throughput.Therefore,in recent years,the second approach,namely "single cell recognition based on tags",has been adopted.The idea of this scheme is to add unique DNA sequence(barcode)to the sequenced samples of each cell.After sequencing,the sequences with the same barcode are considered to come from the same cell.The process of identifying the source of sequence based on barcode tag is called demultiplexing.The solution to demultiplexing can be divided into two types:one is based on the original signal generated by the nanopore,and the other is based on the base sequence transformed by the current signal.The method based on base sequence has many references from previous researches,such as sequence alignment and sequence clustering.However,this approach may not guarantee high precision demultiplexing due to the introduction of errors during translation.Therefore,in recent years,many scholars start from the original current signal to solve the problem of demultiplexing.This method avoids the errors introduced in the translation process,which is essentially a time series clustering or classification problem.In recent years,some methods based on deep learning have been proposed,which are based on neural network model to solve the problem of signal classification.However,all of these methods need to use manually annotated data to complete model training.Generally speaking,manual annotation is tedious and time-consuming,and annotation errors may occur,which brings great challenges to these methods.In continuous experiments and tests,we synthesize the ideas of the above two methods and propose a hybrid clustering method to solve the reuse problem.By "hybrid",WE mean that in our approach,DNA sequences generated by Oxford nanopore sequencing technology are used as well as nanopore signals.Compared with the "black box" processing of deep learning,our method is more interpretable and can achieve considerable accuracy.At the beginning,we use cD-HIT algorithm to complete the initial clustering of DNA sequences,and then obtain the consistent sequence and then the consistent signal according to the initial clustering results.Based on the consistent signal and nanopore signal,the initial clustering is completed.Finally,DTW algorithm(dynamic time warping algorithm)is used to refine the results.In order to improve the efficiency of demultiplexing,we implement a gPU-based parallel acceleration mechanism because DTW algorithm is a highly complex algorithm.Comprehensive experiments show that the performance of our method exceeds all traditional clustering tools and can achieve comparable accuracy of deep learning-based methods.In addition,our method requires no training,is very easy to use and is an open source project available on the Git hub community,which makes it easy to discuss and improve.
Keywords/Search Tags:Oxford Nanopore sequencing, hybrid clustering, demultiplexing, TW algorithm, parallel acceleration mechanism
PDF Full Text Request
Related items