| As an important domestic animal and biological model,pig(Sus scrofa)occupies an important position in the field of biological research.Pigs were domesticated from wild boars independently in the Near East and China approximately 10,000 years ago,and huge differences in phenotypes and genotypes among populations are hidden in their respective gene pools.However,the current genetic studies of pigs are mostly based on a reference genome derived from an individual,and the deficiency of a single reference genome in representing the genetic diversity of the entire species has been highlighted many times in other species.Therefore,it is necessary to construct a pan-genome(pan originating from the Greek ‘π α ν’,all meanings)that reflects both the commonness within pig species and the differences between different breeds.This study constructed a pig pan-genome based on the current pig reference genome(Sscrofa11.1)and the de novo assemblies of 11 geographically and phenotypic representative pig breeds worldwide.At the same time,using the three trillion data from 12 Hi-C samples,and integrating the data of 87 whole genome resequencing and 92 transcriptome sequencing,we analyzed the pig pan-genome from multiple levels including genome,transcriptome,and three-dimensional spatial structure.The results are summarized as follows:1.To construct the pig pan-genome,we first performed assembly-versus-assembly alignment of 11 de novo pig assemblies from Eurasia and Sscrofa11.1.As a result,72.5 Mb of non-redundant sequences(~3% of the genome)were found to be absent from the reference genome(Sscrofa11.1)and were defined as pan-sequences.The homology alignment of the pan-sequences and the genomes of the outgroup species further proved the authenticity of the pan-sequences and indicates that many of the pan-sequences are likely to be ancestral sequences.2.The clustering analysis based on the frequency information of pan-sequences in European pigs and Chinese pigs clearly distinguished the population patterns consistent with their genetic background.Of the pan-sequences,9.0 Mb were dominant in Chinese pigs,in contrast with their low frequency in European pigs.One sequence dominant in Chinese pigs and with confident expressions contained the complete genic region of the tazarotene-induced gene 3(TIG3)gene which is involved in fatty acid metabolism and likely be related to the specific phenotype of Chinese pigs.3.This study combined Hi-C data with the pan-genome to provide a new positioning strategy for anchoring pan-sequence to the reference genome.We show that in some individuals,the allele type represented by the pan-sequence has a stronger interaction with flanking sequences than the reference genomic allele type.At the same time,we identified the characteristics of the pan-sequences in the A/B compartment,TAD(Topologically associating domains),and regulation,and proved that the addition of pan-sequences will help to more accurately describe the three-dimensional structure of the whole genome.4.In addition,this study also constructed a pig pan-genomic database(http://animal.nwsuaf.edu.cn/code/index.php/pan Pig).It is a comprehensive repository of integrated genomics,transcriptomics and regulatory data,making it easier for researchers to use pan-genomes for biological research.In summary,this study constructed a pig pan-genome by using multiple de novo assemblies from different breeds.The newly identified 72.5 Mb pan-sequences greatly enriched the pig variation database.At the same time,this study shown that downstream analysis based on pan-genome can provide more comprehensive genomic variation,whole-genome expression profiles,and 3D genome reconstruction.Therefore,we highlight the transition from the current reference genome to the pan-genome. |