| Camellia oleifera,belonging to the genus Camellia of Theaceae,is an important woody edible oil species in China.It plays an important role in promoting the green growth of local economy and maintaining the security of national edible oil and rural revitalization.At present,the main varieties of C.oleifera are mainly polyploid.High-quality assembly of polyploid plant genomes with large genome size,complex and uncertain polyploid types,and high heterozygosity rate is a formidable task.Camellia lanceoleosa,one diploid wild species in the Camellia Sect.Oleifera,is closely related to polyploid C.oleifera Abel.Therefore,decoding the genome of C.lanceoleosa is of great significance to comprehensively understand the traits of C.oleifera from the molecular level.In this study,the whole genome of diploid C.lanceoleosa was sequenced,assembled and annotated by the second and third generation sequencing technology.Based on this genome data,comparative genome analysis and MADS-box gene family analysis were carried out to provide theoretical basis for molecular breeding and flowering regulation of C.oleifera.The main results are as follows:1.Assembly and annotation of C.lanceoleosa genome at chromosome level.High quality genome of C.lanceoleosa(2N=30)was obtained using third-generation Nanopore,second-generation Illumina and Hi-C technology.The genome size was about 3.00 Gb,and N50 was 1.20Mb.The heterozygosity rate was 2.2%,and 91.85%of the sequences were attached to 15 chromosomes.BUSCO evaluation can find about 95.42%of the complete gene elements,and CEGMA evaluation of core gene accounted for 93.55%.Illumina homology prediction,de novo annotation and transcriptome prediction were used to predict gene structure,and 54,172 genes were found in the C.lanceoleosa genome.Non-coding RNA annotations predicted 807 rRNAs,170 small miRNAs,472 snRNAs,and 920 tRNAs.The repeat sequence annotation showed that the repeat.2.Comparative genome analysis of C.lanceoleosa.Comparative analysis of the genomes of C.lanceoleosa,Camellia sinensis,Arabidopsis thaliana,Actinidia chinensis Planch and PopulusL.showed that there were 2,169 specific gene families in C.lanceoleosa,including 8,006 genes.A comparative analysis of the genomes of C.lanceoleosa,C.sinensis,Arabidopsis thaliana,Actinidia chinensis Planch,PopulusL.Vitis vinifera L.and Oryza sativa L.showed that a total of 705 gene families containing 5,057 genes in C.laceoleosa were significantly expanded.The time of divergence between different species was calculated and it was found that the divergence between C.lanceoleosa and C.sinensis occurred 6-7 million years ago.A genome-wide replication event occurred about 65 million years ago in the C.lanceoleosa genome before the divergence of C.lanceoleosa and C.sinensis.Compared with C.sinensis,the purified selection of genes was weaker,with 918 genes possibly undergoing positive selection.At the chromosomal level,there were 7 chromosomes in the C.sinensis and the purification selection,which was stronger than that of C.lanceoleosa.3.MADS-box gene family analysis related to flowering regulation was carried out.A total of 104 MADS-box genes were identified by HMMER search.Phylogenetic analysis showed that MADS-box gene lacked two subfamilies,AGL12 and BS.Conservative motifs analysis shows that most motifs outside the domain of MADS are only distributed in specific populations.Chromosome localization and replication analysis indicated that the fragment replication might be the main reason for MADS-box gene amplification.Homology analysis of CIMADS genes suggests that some homologous pairs may have formed after the differentiation of dicotyledons and monocotyledons,while some homologous pairs may have formed before the differentiation of ancestors.Ka/Ks analysis suggested that the MADSbox gene family of C.lanceoleosa might have undergone strong purification selection pressure during the evolutionary process.Protein interaction analysis showed that some C1MADS interact with proteins that play important roles in flower development and flowering pathways.Homology modeling of protein 3D structure showed that most ClMADS proteins had similar structures.A large number of cis-acting elements related to light response,hormone response,stress response and development were found in the promoter of ClMADS gene.In addition,expression profile analysis indicated that ClMADS21,ClMADS9 and ClMADS74 might be involved in flowering regulation. |