Font Size: a A A

The Flow Of Long Amplicons Technology Improvement And Software Development For Third Generation Of Mixed Sequencing

Posted on:2022-10-11Degree:MasterType:Thesis
Country:ChinaCandidate:Z F CaiFull Text:PDF
GTID:2480306335995889Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
With its ultra-long length,high throughput,and no GC bias,third generations sequencing has become new weapon for genome assembly(de novo),full-length transcriptome sequencing,structural variation,copy number variation,and methylation studies.Among,Pacific Biosciences(Pac Bio)developed single-molecule real-time sequencing technology(Single Molecular Real-Time,SMRT)that sequencing by synthesis,it can sequencing a few kb or even tens of kb.However,single-molecule real-time sequencing single-pass error rate for long reads is too much higher,how to improve the accuracy of reads by using SMRT has been a problem.In order to the sequencing data available,the sequencing system was upgraded by company and then released the sequel and sequel ? system that improved the activity of DNA polymerase,next generation system is yield more data and depth coverage by this method.Pacbio uses a unique circular template called SMRTbell template to producing multiple reads that generating from a single template molecule and then accurate CCS reads(as known Hi Fi-reads)derives a consensus sequence from these multiple passes of SMRTbell template,the accuracy of Hi Fi-reads can achieve as high as 99.8% that closer the accuracy of NGS by this approach and coverage is only 28× recently reported.It's enough to most study such as population genetics,genetic disease of domestic animals,but for precision medicine of human,it's questionable.In this study,in order to improve the flow of third generation of mixed sequencing,we combining the advantages of NGS multi-label mixing technology and Pac Bio SMRT sequencing,and adding 96 various combinations pair-end barcode sequences of Illumina sequencing into the mt DNA PCR universal primer.Meanwhile,we developed software for mixed sequencing analysis,eventually we achieve high-volume mixed sequencing of mitochondrial genome,increase the number of mixed samples and reduce the cost of individual sample sequencing.Therefore,in this study multi mitochondrial genomes were used to sequencing and analysis,87 samples include 79 Tibetan Masttifs,7 blue sheeps,one snow leopard and were amplified by adding pair-end barcode long range PCR primer,and then we constructed ?8.6 kb and ?8.8 kb multiple long ampicons library for mitochondrial genome successfully by using long range PCR,each fragment was labeled with different pair-barcodes.Finally we obtained 763,518 subreads by Pac Bio sequencing,the average sequencing depth of 4,338×,and we get about 60,000 target size Hi Fi-reads after circular consensus sequencing(CCS),an average of about 340 fragments per sample.Here,we developed a mixed-sequencing pipeline on Pac Bio sequel or sequel? system,which can separate different fragments from pooled amplicons by recognizing pair-end barcodes adding by PCR.Meanwhile we developed an analysis package HQ-Reads Generator written by Python language which can significantly improve the accuracy of long reads.The accuracy of high quality reads(HQ-reads)was achieve as high as QV50 by using this pipeline and software package.We provided a new analysis flow and software for micro-genomes such as mitochondrial genome,chloroplast genome and bacterial genome,and high repetition region.
Keywords/Search Tags:SMRT mixed-sequencing, mtDNA, HiFi-reads, HQ-reads
PDF Full Text Request
Related items