Font Size: a A A

An Error-correction Algorithm Of The Next Generation Sequencing Base On Mutual Information And Expectation-maximization

Posted on:2018-12-01Degree:MasterType:Thesis
Country:ChinaCandidate:Z W TanFull Text:PDF
GTID:2310330542964613Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Next Generation Sequencing(NGS),due to its high sequencing speed and low sequencing cost,has replaced traditional sequencing technology gradually and become the preferred method for sequencing in bioinformatics.However,NGS often results in short reads,and low accuracy.Hence the Next Generation Sequencing relies more on the error-correction tools to correct the errors,and to improve the accuracy of sequencing results.In recent years,with the development of computer technology,the application of computational methods to correct the derived reads has become a trend and normality.In all the next-generation sequencing platform,Illumina sequencing platform,because of its relative to other sequencing platform,with a relatively lower cost and better quality,has become the most popular sequencing platform.However,due to the limit of sequencing technology,the accuracy of Illumina platform will gradually decrease with the increase of sequencing length.And this also makes it difficult to ensure the accuracy of sequencing results.Therefore,based on the above research background and research status,this paper develops a genetic error correction algorithm based on mutual information and expectation-maximization on the Illumina sequencing platform,to improve the quality of the sequencing result.This paper introduces the background and research status of next-generation sequencing technology and the method of genetic error correction.The relevant technologies and algorithms are introduced in detail.A genetic error correction algorithm based on mutual information and expectation-maximization was proposed,and we use it to correct the sequencing results.At the same time,a series of k-mer sequences will be obtained when using next-generation sequencing technology.K-mer sequences plays an important role in the process of error correction,and many error-correcting algorithms need to use these sequences to correct the error.However,the number of k-mer sequences produced by sequencing is very large.If these k-mer sequences are not properly preserved,the speed and accuracy durring subsequent error correction process will be affected.Aiming at this situation,this paper proposes to store the k-mer sequences by using the Floom Filter data structure,to reduce the access time and space of the sequences.A comparison is performed to compare our algorithm with other representative gene correction algorithms.And experiments have shown that,the algorithm proposed from this paper can improve the accuracy of error correction.Meanwhile,using the Floom Filter data structure to store the k-mer sequences effectively reduces the time complexity and space complexity of error correction.
Keywords/Search Tags:Next Generation Sequencing, Error-correction, Bloom Filter, Mutual Information, Expectation-maximization algorithm
PDF Full Text Request
Related items