Font Size: a A A

Applications Of 2b-RAD Technology In Pre-assemblies Scaffolding And Genetic Markers Calling

Posted on:2016-11-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z DouFull Text:PDF
GTID:1220330473458053Subject:Genetics
Abstract/Summary:PDF Full Text Request
Genome-scale genetics studies for non-model organisms is still challenging due to the lack of genetic information. Recently developed restriction-site associated DNA sequencing, a method that samples at reduced complexity across target genomes, promises to deliver high resolution genomic data, thousands of sequenced markers across many samples, for non-model organisms at reasonable costs. It has found wide applications in fine-scale linkage mapping, phylogenetics and phylogeography, genome scaffolding, and population genetics.1. Scaffold assembly based on low-cost Happyseq technologyAssembly of complex genomes using short reads remains major challenges, which always yield highly fragmented assemblies. To address this, here we showed that genome-wide high density BsaXI tags, generated from the HAPPY experiment, can be used as genomic distance proxies to accurately position these fragmented contigs without requiring any sequence overlap. To exploit this, we not only presented a new and easy method that utilizes fosmid library pools as a HAAPY panel without constructing 3D clones pools, but also developed a hierarchical assembly algorithm by incorporating with the sampling technology for scaffolding of pre-assemblies. Simulated analysis showed that the 35,618 BsaXI tags derived from A. thaliana genome can be groupd into 40 contigs, with the corrected N50 size being 4.1Mb; while 95,139BsaXI tags derived from Chr.1 of H.sapiens can fall into 16 contigs, with the corrected N50 size being 14.4Mb. Using the empirical A. thaliana dataset,34,753 BsaXI tags fell into 554 linkage groups with the N50 size being 224kb. We then demonstrate the approach by combing pre-assemblies generated fromPE300 dataset and PacBio long reads with 2b-RAD map, achieving 98.1%-98.5% accuracy in scaffolding contigs. The proposed approach enables us to effectively build high quality genome sequence by incorporating 2b-RAD map and PacBio long reads at a low cost for marine genome assembly project.2. Reference-genetic markers genotyping based on 2b-RAD technology.Here we describe an improved maximum likelihood (ML) algorithm called iML, which can achieve high genotyping accuracy for SNP calling in the non-model organisms without a reference genome. The iML algorithm incorporates the mixed Poisson/normal model to detect composite read clusters and can efficiently prevent incorrect SNP calls resulting from repetitive genomic regions. Through analysis of simulation and real sequencing datasets, we demonstrate that in comparison with ML, iML can remarkably decreased the false positive rate (FPR) of SNPs genotyping by 12%-23% for simulated datasets and by 7%-17% for empirical datasets, however, with no loss of detected SNPs. In addition, current RAD analytical tools are being frequently used for scoring codominant markers only, while for dominant markers that are generated by disruption of recognition sites and are abundant in eukaryotic genomes, they are still largely unexplored. Utilization of dominant markers would greatly diminish the extensive sequencing effort required for large-scale marker development. Here we describe an integrated package called RADtyping that can achieve accurate de novo codominant and dominant genotyping in mapping populations. The performance of RADtyping has been thoroughly evaluated using both simulated and real sequencing datasets. We show that dominant loci can be more reliably genotyped than codominant loci when the average sequencing depth is low. High genotyping accuracy (>96%) was proved by Sanger validation of RAD genotypes obtained from real sequencing datasets.3. Performance evaluation of 2b-RAD technology in genomic selceiton of scallop breedingIt remains unknown whether the marker density provided through GS is sufficient to estimate GBV accurately for aquaculture breeding, although these technologies have already demonstrated significant advantages in reducing the cost of markers genotyping. In this study, we evaluated the performance of application of 2b-RAD method in genomics selection (GS) of Yesso scallop. Simulation analysis demonstrated that prediction accuracy using the markers generated by 2b-RAD was slightly lower than the case with all genetic markers being available under different scenarios. Furthermore, a subset of markers (i.e.,5,000) using RR library had comparable performance with the case using all 2b-RAD markers. But, the genotyping cost of GS projects was dramatically reduced to approximately 1/10th. For real data analysis, we performed the evaluation using families-based breeding population composed of 349 Yesso scallops. Accuracies of prediction models ranged from 0.15 to 0.37 for all traits for across-family dataset and from 0.23 to 0.36 for within-family dataset. In summary, the genotyping flexibility and low cost of 2b-RAD make it an ideal genotyping by sequencing method for genomic selection in aquaculture breeding programs.
Keywords/Search Tags:2b-RAD technology, genome assembly, marker genotyping, genomic selection, Yesso scallop
PDF Full Text Request
Related items