Font Size: a A A

Data Analysis And Decoding Algorithm For The Real Time Pyrosequencing Based On Cyclical Dual Mononucleotide Addition

Posted on:2017-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:R X ZhangFull Text:PDF
GTID:2310330491462426Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
High-throughput sequencing technology plays a highly important role in life science research. However, compared with Sanger sequencing method, it is insufficient in read length and accuracy. Pyrosequencing based on dual mononucleotide addition is a method developed by The State Key Laboratory of Bioelectronics in Southeast University, which could achieve longer read length and higher accuracy.However, the raw results of this kind of pyrosequencing method cannot be used directly, hence, a decode algorithm is required to translate the raw signals into sequence. Therefore, one task of my thesis is to establish the basic decoding model. Firstly, we establish a model to deal with the data. The raw signals are converted as a count matrix. Accordingly, we establish the basic decoding model based on the count matrix. We adopt a kind of comparison method to recover the type of base for each position in the target sequence one by one. Both the demonstration and the results of simulation experiment indicated that basic decoding model is capable to decode the raw signal and recover the whole targeted sequence correctly.Because sequencing errors cannot be avoided, another one task is to establish a error-tolerant decoding model to reduce the impact of sequencing errors. Firstly, we design a basic error-tolerant decoding model to correct errors when the length of the code is less than five. We use a iterative algorithm to revise the count matrix with several iterative steps, and use a enumeration algorithm to analyze each possible solution in each iterative step. The result with wrong solution will be discarded. Secondly, we extend the basic error-tolerant decoding model and improve the filtering criteria. Simulation results revealed that the improved model had better performance and was more robust to errors. However, when several errors happen closely in the raw signal profiles, the performance of the model wound decrease.In order to detect SNP from pooled DNA sequences, we construct the model for the difference between the signal profiles of mutant and wild sequence, and present a preliminary SNP detecting strategy and algorithm. With the purpose of re-sequencing known sequences utilizing this sequencing technology, we applied a greedy algorithm to design the dual mononucleotide addition which could minimize the reaction times.
Keywords/Search Tags:high-throughput sequencing, pyrosequencing based on dual mononucleotide addition, decoding model, SNP detection, pooled sequencing
PDF Full Text Request
Related items