Font Size: a A A

Genome Assembly Of Maize Inbred Line A188 And Its Application In Genetic Analysis Of Important Agronomic Traits

Posted on:2024-01-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:P LiuFull Text:PDF
GTID:1523307172960119Subject:Crop Genetics and Breeding
Abstract/Summary:PDF Full Text Request
With a large population and a constant area of arable land in China,improving crop yield and quality is an effective solution to ensure food security.In our county,maize(Zea mays L.)is the largest crop in terms of planting area and highest in yield.By exploring superior genetic resources,molecular breeding for high yield,quality,disease resistance,and stress tolerance in maize can be effectively accelerated.The maize inbred line A188is an excellent transgenic receptor with high embryo callus induction ability.In addition,A188 has significant differences in multiple agronomic traits with inbred lines B73,Mo17,and W22,which have already completed genome assembly by third generation sequencing.In this study,we firstly assembled the A188 genome and constructed a RIL population and a association panel.Combining comparative genomics,linkage analysis,association analysis,and selection region analysis,the genetic basis of traits such as maize embryo callus induction capacity(REC),ear row number(ERN),survival rate under salt stress(SR),and kernel color(KC)were analyzed,providing rich gene and variation resources for maize molecular and transgenic breeding.The main results of this study are as follows:(1)A high-quality genome of A188 has assembled.In this study,we utilized Pac Bio+Bio Nano technology to assemble a genome of size 2210.33 Mb for A188.The genome contains 4469 scaffolds,with a scaffold N50 sequence length of 11.61 Mb and the longest scaffold sequence length of 47.84 Mb.BUSCOs assembly evaluation showed that 95.3%of the single-copy genes were assembled,and 94.03%of the sequences could be anchored to the B73 reference genome,indicating a high-quality assembly.Annotation results showed that 80.70%of the genome sequence consists of repetitive sequences,slightly lower than other inbred lines such as B73,Mo17,W22,SK,and K0326Y.A total of 44,653 protein-coding genes,66,359 protein-coding transcripts,and 8,603 non-coding genes were identified at the whole genome.(2)Comparative genomics analysis revealed multiple PAVs in the A188 genome compared to those of B73,Mo17,and W22.The analysis showed that 60%of the A188genome sequence was collinear with those of B73,Mo17,and W22,suggesting the conservation of genome sequences among different inbred lines.Structural variation analysis showed more than 10 million SNPs/In Dels between the A188 genome sequence and those of B73,Mo17,and W22,with 6.91%of the variants located in gene regions and their upstream/downstream 5 Kb regions,potentially affecting gene function.Large segmental deletion variation analysis showed that A188 had specific regions with 16.92Mb,16.76 Mb,and 19.42 Mb deletions compared to those of B73,Mo17,and W22,respectively.Ten candidate genes affecting embryonic callus induction ability were identified from these specific regions,combined with previously identified QTLs for callus induction ability.(3)The MQ2Gpipe pipeline analysis tool was developed for population genotype construction.In this study,we developed the MQ2Gpipe pipeline analysis tool based on the Snakemake framework.This tool encapsulates the core steps of genotype typing,including sequencing data quality control,reference sequence alignment,individual variation site detection,population genotype typing,low-quality site filtering,variation site annotation,and missing genotype interpolation,to automatically detect individual/population whole-genome variation sites from raw sequencing data.It simplifies the analysis process of whole-genome variation site detection in large-scale samples,and improves the efficiency of genomic data information mining.Compared with the conventional method of step-by-step analysis through shell scripts,MQ2Gpipe can achieve parallel processing of multiple samples and continuous operation breakpoints,reducing the impact of manual operations on data analysis efficiency and promoting the full utilization of computing resources.Using this tool,we detected the whole-genome variations of the linkage and association panels constructed in this study.The tool has a simple operation configuration and reliable results,and is hosted on Git Hub(https://github.com/Alipe2021/MQ2Gpipe)for free use by researchers.(4)Recombinant inbred lines(RILs)population was constructed using A188 as parent,and a high-density genetic map was constructed for the population.In this study,290 homozygous families were generated by single-seed descent method using two maize inbred lines,A188 and B73,which have significant phenotypic differences across multiple traits.DNA resequencing was performed on the RILs,and using the MQ2Gpipe tool and the A188 genome as the reference,4,073,111 high-quality variants were identified in the population,which were used to form 6,013 bin markers and a total genetic map distance of 1145.88 c M.The average distance between bin markers was 0.19 c M,and each bin contained an average of 667 molecular markers.This genetic map has high marker density and good consistency,and can be used for genetic mapping of multiple parent-differentiated phenotypes.(5)The REC-,ERN-,and KC-related genetic regions were identified using the genetic map.Combining the phenotypic and genotypic data for REC,ERN,and KC in this recombinant population,a composite interval mapping approach was used to detect2,5,and 7 QTLs in each of the three traits,respectively,based on a threshold condition of LOD>2.5.One major effect QTL for REC overlapped with previously reported QTLs associated with callus formation and contained several functional genes related to callus formation,such as WUS2.One major QTL for ERN(q ERN4-2)identified five candidate genes associated with ear row development and one previously-reported gene KRN4,which regulates ear row number.The identified ERN-related QTL overlapped with a 2.12Mb interval of a previously identified ear number QTL by the research group.The genetic interval for grain color included one high-effect(R~2=45.8%)segment,which contained a yellow endosperm coding gene Zm Y09GFa031728 that may directly affect grain color,and a known kernel color gene R2R3MYB on chromosome 2.These results demonstrate that this recombinant population is suitable for genetic analysis of multiple agronomic traits.(6)A high-density genotypic map of 80 core maize germplasms was constructed.An association panel consisting of 80 maize inbred lines was subjected to DNA resequencing,and using the MQ2Gpipe tool,a total of 2,945,497 high-quality SNPs and1,648,206 In Dels evenly distributed across 10 chromosomes were identified.The LD analysis showed that the average decay distance of the population was about 100 Kb under the R~2=0.3.The population structure analysis showed that the 80 inbred lines could be divided into four subgroups with the minimum cross-validation coefficient achieved when the number of subgroups was four.The principal component analysis showed that the first three principal components explained 24.06%of the genetic variation.The relationship analysis and the clustering tree constructed based on the NJ method showed that the 80 materials could be classified into four groups,which was consistent with the population stratification results.The rich information of phenotype variation and genetic variation of the population laid a molecular foundation for the genetic analysis of different agronomic traits through the construction of the genetic variation map.(7)The genetic and selective mechanisms underlying the induction ability of embryonic callus were analyzed using selective region analysis.Based on the high-density variation map of 80 inbred lines,the genetic mechanism of callus induction ability was analyzed.By calculating the population differentiation index of different embryogenic callus induction ability subgroups,this study identified a 95.23 Mb selective region related to embryogenic callus induction ability.Candidate region association analysis of the selective region showed that 43 genetic loci and 103 candidate genes were related to maize embryogenic callus induction ability.Combined with the six traits of interest in inbred line breeding,analysis of the LD blocks where the 43 genetic loci were located revealed a strong linkage between the significant loci for panicle leaf area and embryogenic callus induction rate,suggesting a synergistic selection between panicle leaf area and embryogenic callus induction rate during breeding.(8)Combining different density genotypic maps,the genetic basis of survival rate(SR)during salt stress in maize seedlings was analyzed by GWAS.Indoor hydroponic experiments showed that the broad-sense heritability and coefficient of variation of survival rate under salt stress in maize seedlings were 45.67%and 21.80%,respectively.Based on low-density genotypic map,we identified 5 significant loci and 86potential salt stress-responsive genes,including Zm PP2C(Zm00001eb013450),protein kinase family member gene(Zm00001eb013480)and two hub genes Zm00001eb013650and Zm00001eb198930 differentially expressed in the transcriptome.Based on high-density genotypic map,we identified 11 variant loci and 63 possible salt stress-responsive genes.According to the A188 reference annotation,candidate gene Zm Y09GFa015510encodes an Ankyrin repeat protein,whose homologous gene’s salt response function has been verified in Arabidopsis.In addition,the candidate gene Zm Y09GFa018164 encodes a C2H2-type zinc finger transcription factor,and the gene Zm Y09GFa019618 encodes a MYB transcription factor,both of which have been reported to be related to salt response in previous studies.The identification of these genes provides rich genetic resources for studying the genetic mechanism of salt stress response.In summary,the assembled A188 genome,constructed RIL population and high-density genetic map,constructed 80 natural populations and high-density variation map,and identified multiple genetic regions,variation sites,and candidate genes related to phenotypes have laid a good foundation for maize molecular breeding and transgenic breeding.Meanwhile,it has been demonstrated that the quality of the A188 genome assembled in this study is high and can be applied to the genetic analysis of different agronomic traits.
Keywords/Search Tags:Maize genome assembly, Comparative genomics, Embryonic induction ability, Survival rate under salt stress, Linkage analysis, GWAS, Selective analysis
PDF Full Text Request
Related items