| The silkworm(Bombyx mori),as a representative of Lepidopteran insects,was completely domesticated by wild silkworms after five thousand years of artificial selection.Because of its important economic value and cultural value,it has an irreplaceable position in the historical development of China.In order to meet the increasing demand for silk products,it is urgent to carry out breeding and research on silkworm.In recent years,the continuous development of high-throughput sequencing technology and the deepening of molecular biology research have provided new ideas for silkworm genetic breeding and molecular function research by mining a large amount of genetic information and carrying out molecular breeding strategies for directed breeding.The genome sequence serves as an important reference for the study of silkworm molecular breeding,and its high-quality assembly and annotation level is essential for comprehensive and accurate understanding of the genetic material of the silkworm,and for comparative genomics research with other insects.However,the current silkworm genome still contains unsequenced regions and mismatches,which affect the efficiency and accuracy of functional genome research.In this study,the silkworm Dazao was used as the object,through the use of threegeneration long-sequence sequencing(PacBio)combined with chromatin conformation capture technology(Hi-C),the chromosome-level high-quality genome(SilkDB 3.0)was assembled and combined with transcriptome data and get more comprehensive genome annotation information.Then,based on the SilkDB 3.0 genome,a multi-omics integration analysis was performed on the silkworm in four aspects: transcriptome,3D genome,pangenome and comparative genome.This study provides a new data basis for the study of silkworm molecular breeding,helps to understand the protein function and gene expression of the silkworm,and also helps to compare the genetic variation among silkworm species.The main findings are as follows:1.The assembly of SilkDB 3.0 genome of Bombyx moriWe assembled the SilkDB 3.0 version of the silkworm Dazao genome using PacBio sequencing and Hi-C technology.The SilkDB 3.0 genome consists of 28 chromosomes and the size is about 468.3 Mb.The N50 of contigs reaches 17.6 Mb,and the N75 reaches 15.1 Mb,which has good continuity.In contrast,the genome size of SilkDB 2.0 version is about 432 Mb.,The N50 of contigs is 4 Mb,and the N75 of contigs is 1.6 Mb.BUSCO evaluation results show that SilkDB 3.0 and SilkDB 2.0 genomes respectively predict 98.1% and 92.3% of the genes in the core insect gene set "insecta_odb9",of which the proportion of single-copy genes accounted for 97.2% and 92%,respectively.The chromosome interaction heat map drawn based on Hi-C data also showed a high consistency.Therefore,SilkDB 3.0 genome has reached high accuracy in gene integrity and chromosome sequence order and direction,which can provide a better data basis for the study of silkworm functional genomics.2.Annotation of SilkDB 3.0 Genome of Bombyx moriBased on the high-quality SilkDB 3.0 genome,this study combined 253 samples of silkworm RNA-seq data to annotate a total of 16,069 high-quality genes.Not only that,SilkDB 3.0 also uses KO,GO,KOG,Pfam and KEGG ENZYME to annotate protein sequences and obtain more gene function annotation information.In addition,we also predicted the subcellular localization of the silkworm protein,as well as the threedimensional structure of the protein.In order to fully understand the changes in silkworm gene expression,we analyzed the silkworm gene expression profile(including 10 periods and 16 tissues)and a weighted co-expression network(combined with the protein-protein interaction data in the STRING database).Taking PRMT5 gene as an example(Protein arginine methyltransferase 5),the expression profile shows that this gene is highly expressed in silkworm ovary,and 36 co-expressed genes are predicted by the coexpression network.Among them,the protein-protein data obtained from the STRING database shows that,PRMT5 protein has 6 directly related proteins.The gene annotation files,gene expression profiles,and gene co-expression networks obtained in this study provide a more comprehensive data basis for the study of silkworm gene functions,and are conducive to mining the potential functional genes of the silkworm.3.Comparative genomics research of Bombyx moriIn this study,the genome genes of six insects,Bombyx mori,Spodoptera litura,Trichoplusia ni,Tribolium castaneum,Aedes aegypti,and Drosophila melanogaster were annotated with the Pfam domain,and a phylogenetic tree was drawn based on the Pfam domain.At the same time,the collinearity analysis was also carried out on the silkworm and Spodoptera litura.The clear distinction between orthologs and paralogs is essential for the robust evolution of constructed genes and the functional annotation of newly sequenced genomes.We performed an orthologous cluster analysis of six insects,Bombyx mori,Spodoptera litura,Trichoplusia ni,Tribolium castaneum,Aedes aegypti,and Drosophila melanogaster,and the results showed that the six species share 4765 core orthologous proteins..In order to better study the function and evolutionary relationship of proteins between the silkworm and other five insects,this study mapped the protein network corresponding to the orthologous gene cluster,and drew the phylogenetic tree based on the orthologous protein.The amino acid motifs of homologous proteins were counted.Taking the Bombyx mori PRMT5 protein(BMSK0008583.1)as an example,it can be seen from the protein network that the Bombyx mori BMSK0008583.1 protein has a higher similarity with the Spodoptera litura TRNI03335-PA and the Spodoptera litura SLIT00777-PA,and the phylogenetic tree It can be seen that Bombyx mori,Spodoptera litura and Trichoplusia ni differentiated from the clade earlier than the other three insects(Tribolium castaneum,Aedes aegypti,and Drosophila melanogaster).4.Bombyx mori pangenomics researchIn order to understand the variation information between different silkworm strains,we collected 163 representative silkworm resequencing data in a wide area of the world.Taking SilkDB 3.0 genome as a reference background,we fully excavated SNP and In Dels and other mutation information,and drew phylogenetic trees of 163 silkworm species.Compared with the reference genome Dazao,it was found that the five wild silkworm species(C2wild,Cwild,C6 wild,C1wild and C5wild)with the largest number of SNP,the SNP phylogenetic tree showed that it was divided into eight main clades.Through comparative analysis of the SNP information of the silkworm and wild silkworm,it was found that the fatty acid acyl desaturase(desat1)gene(BMSK0007052)related to the synthesis of sex pheromone and the tyrosine-protein kinase(Btk29A)coding gene(BMSK0007690)related to male genital development and oogenesis(Btk29A),which have a large number of SNP in wild silkworms,but fewer SNP in silkworms.Analyzing the In Dels information of different varieties from the cocoon color phenotype of the silkworm,it is found that Chinese tetramolter varieties of white cocoons have less In Dels at the 5’end of the carotenoid-binding protein(CBP)coding gene(BMSK0000983)than the Chinese tetramolter varieties of yellow cocoons. |