Font Size: a A A

PaSS:A Sequencing Simulator For PacBio Sequencing

Posted on:2020-04-18Degree:MasterType:Thesis
Country:ChinaCandidate:W M ZhangFull Text:PDF
GTID:2370330620960221Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Background: The third-generation sequencing platforms,such as PacBio sequencing,have been developed rapidly in recent years.PacBio sequencing generates much longer reads than the second-generation sequencing(or the next generation sequencing,NGS)technologies and it has unique sequencing error patterns.Bioinformatics tools and algorithms,such as sequence alignment programs,genome assembly programs and structural variant callers have been emerging for PacBio data analysis.The simulation of PacBio data can help the users to evaluate different analytical tools and approaches and determine some critical parameters.In addition,generating in silico data can significantly reduce the cost and time required for improving the downstream analysis tools.Besides,PacBio sequencing has been developed quickly with multiple versions.Therefore,an effective read simulator which can simulate reads targeting on a specific version of PacBio technology is essential to evaluate and promote the development of new bioinformatics tools for PacBio sequencing data analysis.Results: We developed a new PacBio Sequencing Simulator called PaSS.It can learn sequence patterns from PacBio sequencing data currently available.In addition to the characteristic of multi-pass and distribution of read lengths,we included the context-specific bias and the unaligned part of high error rate in the sequencing error model.We compared PaSS with existing PacBio sequencing simulators such as PBSIM,LongISLND and NPBSS,and K-S test was used to assess the comparison results.PaSS performed better in many aspects.Finally,we also used an indirect comparison method by comparing assemblies derived from data generated by different simulators,and the results suggest that reads simulated by PaSS are the most similar to experimental sequencing data.Conclusion: PaSS is an effective sequence simulator for PacBio sequencing.It will facilitate the evaluation and development of new analysis tools for PacBio sequencing data and will accelerate the application of PacBio sequencing.
Keywords/Search Tags:Third Generation Sequencing, PacBio sequencing, sequencing simulator, sequencing error model
PDF Full Text Request
Related items