| Cotton is one of the most important economic crops.Previous studies have identified loci for important traits,but identifying causal genes/variants and understanding their roles in phenotype formation and regulation is a challenging and time-consuming task.With the completion of genome assembly for various cotton species and the accumulation of multiomics data,integrating and analyzing these data has become a reliable method for locating causal genes and variants.In this study,we established a cotton multi-omics database(CottonMD,https://yanglab.hzau.edu.cn/Cotton MD/)by collecting,integrating,and analyzing various multi-omics datasets,covering genomics,population genetics,transcriptomics,epigenetics,and metabolomics.We also applied multiple statistical methods to determine the association between variants,gene expression,and phenotype.Cotton MD provides researchers with easy-to-use analysis tools to quickly obtain relevant data and conduct multi-omics data analysis.The main results of this study are as follows:(1)We collected and integrated a large amount of multi-omics data from cotton,including 25 reference genomes of cotton plants,whole-genome resequencing data for 4,180 cotton materials,20 phenotypes related to important agronomic traits of cotton,population expression data for 251 Gossypium hirsutum fiber tissue samples,transcriptome sequencing data covering 76 tissues of Gossypium hirsutum and Gossypium barbadense,epigenetic data from five cotton species,and metabolomic data from 768 metabolites from four tissues.(2)Based on the collinearity between genes,we used 146,881 gene indexes constructed from the genes of 25 cotton genomes to link genes between different genomes.We constructed a variation panel of the largest cotton population to date that consists of 4,180 cotton accessions.Based on population structure analysis and sample information,the cotton materials were divided into eight subgroups(G0~G7),and we calculated the genetic diversity and Tajima’s D of each subgroup,as well as FST and XP-CLR between subgroups.To determine the association between variation and phenotype and reveal its molecular mechanism,we conducted a joint analysis of multi-omics data at the population level.Using population expression and genotype data for eQTL analysis,we identified a total of 41,176 eQTL.In the GWAS analysis of 20 phenotypes,we identified 18 candidate loci,among which 1,190 candidate variants were significantly correlated with six phenotypes.We identified 483 candidate genes significantly associated with six fiber-related phenotypes by TWAS analysis.We identified 23 candidate genes significantly associated with six phenotypes by SMR analysis.We found that the cis-eQTL of 206 candidate genes co-localized with QTLs for 16 phenotypes by colocalization analysis.In addition,we standardized the processing of transcriptomic,epigenetic,and metabolomic data from different datasets according to a standardized process.(3)Based on the collection and mining of cotton multi-omics data,we established a cotton multi-omics resource information platform(CottonMD).CottonMD provides genome online browsing,genome collinearity comparison,gene annotation query,visualization of gene expression profiles,population variation query,population genetics analysis,epigenetic data visualization,and metabolomics data analysis.CottonMD will facilitate the integration and sharing of multi-omics data and promote research on cotton genetics and breeding.In summary,CottonMD provides rich cotton multi-omics data,germplasm resource management tools,and various online multi-omics analysis tools,facilitating the exploration of the association information between "variation-gene expression-phenotype"and the understanding of the underlying mechanism of genetic variation affecting gene expression and phenotype. |