Font Size: a A A

Research Of New Generation Highthroughput Sequencing Data Correction Method

Posted on:2019-06-27Degree:MasterType:Thesis
Country:ChinaCandidate:S Y LiFull Text:PDF
GTID:2370330548492941Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of information detection technology,high-throughput sequencing technology has pushed the life science research to a new height.Currently,Illumina and Ion Torrent high-throughput sequencing platform is two main sequencing platform,Illumina is based on the reversible termination,fluorescent tags dNTP sequencing by synthesis,while,the Ion Torrent is based on the semiconductor chip,through the sensor will sequence synthesis of pH change in the process of chemical reaction,into a voltage signal of sequencing.Ion Torrent sequencing platform is used to measure a class of bases each time,and there is a problem of uncertainty of the length of multiple bases(the same base),while the Illumina sequencing platform only tests one base at a time,which does not exist.The Illumina sequencing platform uses optical sequencing,and it is possible to detect the wrong base type.Ion Torrent will not detect the wrong base type,because it is tested by chemical reaction.Because the sequencing principle is very different,two sequencing machines are highly complementary.Therefore,according to the complementarity of two sequencing platforms,a method for cross-calibration of high-throughput sequencing data of Illumina and Ion Torrent is proposed.In this paper,the logic analysis method is designed to realize the error correction of base type error correction and Ion Torrent sequencing data of Illumina sequencing data.The first method is to correct the two sequencing data according to the three principles,namely,the sequencing length of Illumina sequencing data is correct.The sequencing type of Ion Torrent sequencing data is correct;Illumina sequencing and Ion Torrent sequencing at the same site will not go wrong simultaneously.Based on this principle,two groups of sequencing data were preprocessed,and the reasons for the error of Illumina sequencing data were analyzed.Then,the comparison and analysis of the two kinds of sequencing data correction principle;Finally,the data is calibrated according to the correction principle.According to the results,because of Illumina sequencing cost is higher,also separate design a comprehensive model based on neural network and dynamic programming algorithm,the Ion Torrent sequencing data base polymer length error in direct correction.In view of this method,the factors that lead to the detection error of multiple base length in Ion Torrent sequencing data are analyzed.Then,the error correction model based on multi-layer neural network is designed.Finally,in order to improve the recognition accuracy,introduced the reference gene information,design of multi-layer neural network and combined with the comprehensive model of dynamic programming algorithm,the model can be directly to Ion Torrent sequencing data correction directly.Subsequently,the experimental results show that the proposed method can effectively correct the base type errors in Illumina sequencing data,as well as the error correction of polybase length in Ion Torrent sequencing data.
Keywords/Search Tags:Homopolymer, Cross-correction algorithm, Multi-layer neural network, Dynamic programming, Comprehensive model
PDF Full Text Request
Related items