Font Size: a A A

High-Throughput Long Paired-End Sequencing Of A Fosmid Library By PacBio

Posted on:2020-12-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z Z DaiFull Text:PDF
GTID:1360330611982889Subject:Biochemistry and Molecular Biology
Abstract/Summary:PDF Full Text Request
The development of DNA sequencing technology has a short and rich history,and there have been many advancements in just over 40 years.With Sanger's electrophoresis(the first generation)sequencing technology,the door to DNA sequencing was opened with its long read length and high precision,but its high cost and low throughput limits its development.Massively parallel genome-sequencing technologies,with their low cost,high throughput,high accuracy and other characteristics,have become the mainstay of biological sequencing,except that short read lengths seriously hinder the study of large and complex genomes containing long repeats.Single-molecule real-time synthesis and sequencing technology such as Pac Bio and Nanopore are new leading technologies with high throughput,long read length and other advantages,that create a new era of biological sequencing,although their disadvantages,such as a high error rate,can not be ignored.Currently,these DNA sequencing technologies are being rapidly developed and updated,and are widely used in genome study.Genomic libraries are collections of genomic DNA from a certain species that has been fragmented into specific sizes by biological,chemical or physical disruption and then introduced into a host by a vector.They are important tools and materials for molecular cloning,genomic structure and functional characteristic research.Paired-end(or mate-pair)sequencing technology using genomic libraries with different inserts to obtain paired-end sequences through different sequencing technologies-plays an important role in the field of biological sequencing.For example,the BAC library clones‘ end sequences are generated through Sanger sequencing technology to construct physical maps that help resolve long repeats and segmental duplications and provide long-range connectivity in shotgun assemblies of complex genomes.Fosmids are shorter than BACs but much easier to generate.Therefore,mate-pair Fosmid library clones‘ end sequences based on the Illumina sequencing platform enable the detection of structural variation predominantly mediated by repetitive elements such as insertions,deletions,and inversions,which are commonly larger than 1 kb and are difficult to identify using conventional small insert paired-end libraries(300-500 bp).Moreover,paired-end sequences of Fosmid and BAC libraries have made significant contributions in identifying long range structural variations in inter-or intrachromosomes and in assessing the quality of whole genome assemblies,even correcting misassemblies and reducing contig numbers.However,the first and second generation sequencing platforms can not generate DNA sequences longer than 1 kb,and the cost of the first generation sequencing platform is very high.Thus,the short read pairs(<1 kb)generated by these paired-end sequencing technologies are limited in the assembly of complex genomes,and repetitive regions(>1 kb)are usually missing or misassembled,leading to fragmented and incomplete genomes.Therefore,longer paired-end reads are required.We developed a new method for long paired-end sequencing of large insert libraries,which can efficiently improve the quality of de novo genome assembly and identify large and small structural rearrangements or assembly errors.A Fosmid vector,p HZAUFOS3,was developed with the following new features: 1)two 18-bp non-palindromic I-Sce I sites flank the cloning site,and another two sites are present in the skeleton of the vector,allowing long DNA inserts(and the long paired-ends in this paper)to be recovered as single fragments and the vector(?8 kb)to be fragmented into 2-3 kb fragments by I-Sce I digestion and therefore was effectively removed from the long paired-ends(5-10 kb);2)The chloramphenicol(Cm)resistance gene and replicon(ori V),necessary for colony growth,are located near the two sides of the cloning site,helping to increase the proportion of the paired-end fragments to single-end fragments in the paired-end libraries.Paired-end libraries were constructed by ligating the size-selected,mechanically sheared pooled Fosmid DNA fragments to the Ampicillin(Amp)resistance gene fragment and screening the colonies with Cm and Amp.We tested this method on yeast and Setaria italica Yugu1.Fosmid-size paired-ends with an average length longer than 2 kb for each end were generated.The N50 scaffold lengths of the de novo assemblies of the yeast and S.italica Yugu1 genomes were significantly improved.Five large and five small structural rearrangements or assembly errors spanning tens of bp to tens of kb were identified in S.italica Yugu1 including deletions,inversions,duplications and translocations.
Keywords/Search Tags:Fosmid, long paired-end, PacBio, Ampicillin resistance gene tag, de novo assembly, structural rearrangement or assembly error
PDF Full Text Request
Related items