Font Size: a A A

Allele-aware Chromosome Assembly And Impacts Of Structural Variations On Artemisinin Yield Of Artemisia Annua

Posted on:2022-06-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:B S LiaoFull Text:PDF
GTID:1484306350959329Subject:Pharmacy
Abstract/Summary:PDF Full Text Request
Artemisinin,derived from Artemisia annua,prevent millions of people from the life-threat of malaria worldwide.However,there are still a large number of people infected and even died from malaria every year,and one important reason for such high mortality is the insufficient supply of artemisinin drugs.A.annua plants are still the major source of artemisinin,and thus breeding elite germplasms with high artemisinin is a crucial step to mitigate the short supply of artemisinin drugs.Molecular-assisted breeding based on genomic data has made great progress in a large number of crop species.High-quality genome assembly and annotation are critical for understanding variations of elite accessions and ecotypes but remain challenging for A.annua because of high heterozygosity and repetitiveness.The key genes involved in the artemisinin biosynthetic pathway(ABP)and the related regulatory transcription factors have been well elucidated.However,the number,structure,and distribution of ABP genes in the whole genome,as well as the structure and expression differences among different strains are still unclear.To date,a draft genome of Artemisia annua has been published.However,the contiguity of the published A.annua genome is not sufficient for gene location and structural variation analysis.The construction of a platinum genome of A.annua will help elucidate artemisinin biosynthesis and its regulatory mechanisms,and further facilitate molecular-assisted breeding of A.annua.In this study,four chromosomal haploid maps were obtained from two chemotypic distinct A.annua strains:HAN1(high-artemisinin strain,1%of dry weight)and LQ-9(low-artemisinin strain,0.1 1%of dry weight).High heterozygosity(more than 2%)and repetitiveness(more than 70%)were found in A.annua genomes.Multiple sequencing and assembly technologies are effectively combined in the assembly process of A.annua genome:firstly,PacBio HiFi reads were assembled into diploid contigs;secondly,the contigs were ordered and connected to superscaffolds by mapping to the Bionano optical map;thirdly,superscaffolds were separated into two phases based on allelic synteny,and finally pseudochromosomes were constructed from superscaffolds in each phase with Hi-C data.Bionano optical map plays a key role in genome assembly by bridging contigs to scaffolds.The repetitive sequences account for a large proportion of the whole genome and a large number of repetitive sequences had lengths more than 10 kb,resulting in low contig continuity which cannot be further effectively connected by Hi-C data.In this study,more than 90%of the contigs were ordered and connected to superscaffolds by optical maps.The huge improvement in sequence continuity and the ultra-long superscaffolds are the key prerequisites for haplotype phasing and chromosome construction.The genome acquisition strategy provided in this study can be used to build a high-quality reference genome,which is essential for elucidating the biosynthetic pathways and assisting the genetic breeding of herbs.In total,54,347 high-confidence protein-coding gene models were predicted in the LQ-9 phase0 genome.The evolutionary dynamics of gene families were analyzed by comparing the A.annua genome with those of 12 representative plant species,and a total of 26,438 candidate gene families were identified,among which 3,125 were A.annua-specific and 1,073 seem to have expanded in A.annua.The estimated divergence time of A.annua and Chrysanthemum nankingense was 11.43 Ma,whereas that of A.annua and Helianthus annuus was 37.96 Ma.A whole-genome triplication(WGT)event occurred during A.annua evolution at 58.12 Ma.In addition,benefitting from the high continuity and accuracy of the genome assembly,abundant tandem gene replication loci(comprising 8,532 genes)were identified in the LQ-9 phaseO genome.Further analysis showed that post-WGT gene retention and tandem duplication play major roles in gene family expansion.Rich genetic diversity was found within A.annua by pairwise alignment of the four assemblies with the public genome:11,381,548 SNPs,1,926,392 small insertions/deletions and 162,257 large structure variants(SV)were identified.Although SV has a relatively small number,it affects a large number of bases,which is the major reason for the formation of genome heterozygosity.At the same time,genome resequencing was conducted on 36 distinct individuals representing a geographical distribution and different artemisinin contents from southern to northern China.In total,36,104,389 SNPs were identified in A.annua population,and the cumulative number of SNPs identified from the 36 individuals showed that many SNPs remain to be discovered,indicating high genetic diversity within A.annua.By analyzing 24 RNA-seq datasets from four tissues,abundant tissue-specific genes were identified.All ABP genes showed high mRNA abundance in flowers and leaves.In leaves,3,542 DEGs were detected between LQ-9 and HAN1.And among these DEGs,all copies of ADS and ADH1,one copy of DBR2,and one copy of CYP71AV1 were differentially expressed,showing higher mRNA levels in HAN1.A gene co-expression network constructed by WGCNA revealed that ABP genes had co-expression with 1,744 other genes.These genes are functional enriched in wax synthesis which is closely related to glandular secreting trichome(GST)development and artemisinin synthesis.Besides,63 transcription factors were included in the co-expressed genes,and two of them were previously verified to regulate artemisinin biosynthesis.Thus,we speculated that the 63 transcription factor genes are potential regulators of artemisinin biosynthesis.The chromosomal location of ABP genes revealed that they were scattered on multiple chromosomes rather than clustered in neighbor regions.Multiple copies mainly derived from tandem duplication were prevalent in ABP genes.Strains with high artemisinin contents have more copies of ADS and higher levels of ADS transcription.In transcriptome data,all ADS copies were found to be transcribed.Hence,it is speculated that more copies of ADS increase its expression level,and finally produce a higher artemisinin content.A strong linear positive correlation between artemisinin content and ADS copy number was observed for 36 individuals,and the copy number of ADS could be a key characteristic for categorizing high-or low-artemisinin chemotypes,making ADS a reliable marker for the genetic breeding of A.annua.Except for DBR2 and CYP71AV1,different copies of other ABP genes tended to have same expression profile among different tissues.Only one copy of DBR2(DBR2.2)was highly expressed in leaves,and it was considered directed participation in artemisinin biosynthesis.The different expression level DBR2.2 between strains was speculated to be regulated by cis-regulation of diverse promoter sequences.The expression profile of two CYP71AV1 copies was contrary to the previous studies,and distinct 3'-UTRs were observed between the copies.These results suggested that a more sophisticated regulatory mechanism exists in artemisinin biosynthesis and further studies are required.In addition,a genome database with convenient tools of A.annua was developed,which makes genome data of A.annua publicly accessible and will promote genetic data mining of A.annua.
Keywords/Search Tags:Artemisia annua, genome assembly, WGT, tandem duplication, structural variation, genetic breeding, database
PDF Full Text Request
Related items