Font Size: a A A

Research On Analysis Method Of Nanopores Sequencing Data Based On TCN

Posted on:2021-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:X B WangFull Text:PDF
GTID:2480306047979129Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Since the discovery of DNA genetic materials,researchers keep exploring and studying them,and have conducted a series of genome sequencing projects.Today,gene sequencing technology has reached its fourth generation through continuous innovation.The latest nanopore sequencing technology can identify the nucleotide sequences of the tested DNA strand by the change of current signal when the strand passes through the nanopore.It does not need the complicated DNA strand pretreatment process,and has advantages of low cost,fast speed,easy to carry,real-time test and long reading length,but it also has the disadvantage of low accuracy.Among the methods of nucleotide sequence recognition for nanopore sequencing signals,one is a statistical method represented by Hidden Markov Model,and the other is a deep learning method represented by LSTM circular neural network.However,due to the simple structure,the recognition accuracy of the former model is not ideal.For the latter model,since the analysis of the time series input signals is insufficient,the recognition effect still has a large space for improvement.In view of this,in this paper,we studied the nucleotide sequence recognition of nanopore sequencing signals with TCN time sequence convolutional neural network.First,we preprocessed the sequencing data from the nanopore sequencer Min ION about lambda phage where the data was screened according to quality score and sequence length.Then,the reference genome was obtained in the NCBI database to form the data set for further research.Subsequently,a recognition model was constructed based on the framework of TCN time series convolutional neural network to realize the nucleotide sequence recognition from the nanopore sequencing current timing signals of the DNA strand under test.In the process of model construction,considering that the data studied are temporal data,causal convolution was adopted,and extended convolution was also adopted to solve the problem of small perception field.Combining the decoding mechanism of the connected time series classifier(CTC)in the data output part,the problem of prediction of end-to-end unaligned sequences without segmentation was solved by introducing a blank placeholder and a probabilistic calculation method.The model parameters were optimized with its loss function.To cope with long input sequences,an attention mechanism was introduced that can quickly filter the high-value information from a large amount of information,and batch standardization was adopted to solve the ICS(Internal Covariate Shift)problem to further improve the performance of the model.Finally,an integrated model based on the basic model and attention mechanism was built by using the fusion theory.Compared with the existing model,the recognition result of the designed model was proved to be more accurate.
Keywords/Search Tags:Nanopore sequencing, Basecalling, Neural network, Optimization of model
PDF Full Text Request
Related items