Analysis Of The Genetic Resources Of World Cattle Breeds And Construction Of Pan-genome Using Non-reference Genome Sequences

Posted on:2023-09-21

Degree:Master

Type:Thesis

Country:China

Candidate:X T Han

Full Text:PDF

GTID:2543306842969209

Subject:Agriculture

Abstract/Summary:

PDF Full Text Request

Genomic sequences that differ among breeds and individuals of cattle are the main reason controlling their formulation of different phenotypes and having different economic values.The use of the genome of a single breed/individual(Hereford cattle)as a reference to conduct research on cattle for a long time has severely limited the exploitation of superior genetic resources of different breeds and individuals.In this research,we used high depth sequencing data(read depth > 15×)to address the differences(insertion variants)in the sequences of 450 cattle of 31 breeds relative to the reference genome of Hereford cattle;evaluated the biological effects of unknown sequence insertion variants and further explored the genetic resources of unknown sequence insertion variants;at last,we used the second-generation high-throughput sequencing data of 898 individuals of 57 breeds to construct a multi-breed representative pan-genome based on the Hereford cattle reference genome,annotated the unknown sequences on the pangenome with coding ability,and initially formed a representative cattle reference genome.The main results of the study are as follows.(1)A total of 58,862 unknown sequence insertion variants were detected in the secondgeneration high-throughput sequencing dataset of 450 cattle from 31 breeds with read depth > 15×,and the same insertion variant site was detected up to 791 times,and the total length of insertion sequence was about 194 Mb,and the average length of the insertion sequence was 315 bp;the insertion sequence length was mainly enriched in the range of50-1000 bp.It is noteworthy that the number(1886)and length(343 bp)of insertional variants occurring in different populations of Bos indicus were larger than in the Bos taurus population,compared to the significantly larger number(1110)and length(305 bp)of insertional variants in Hereford cattle.The limitations of the reference genome in Hereford cattle were tentatively demonstrated.Separate annotation analysis of insertion sites for gene functional elements revealed that insertion sites were enriched in non-genetic regions of the bovine genome,with 23,423 insertion sites occurring in genetic regions and only 22% in coding regions,with possible effects on gene coding sequences.(2)Population structural analysis was further developed using insertional variants.Principal component analysis showed that the first principal component could clearly distinguished the three groups of Bos indicus,crossbred cattle and Bos taurus,while pedigree analysis also yielded the same results,demonstrating that insertional variation of unknown sequences differed selectively among cattle breeds.A total of 333 significantly different loci(top 1%)were identified by calculating the population differentiation index(Fst)of the insertional variant loci for the Bos indicus and Bos taurus populations.These significant insertion sites affected a total of 190 genes.Functional enrichment analysis of affected genes,GO and KEGG results showed significant enrichment(p < 0.05)of insertional variant loci in both populations in entries related to olfactory,immune and substance metabolism aspects,indicating a lack of sequence information related to relevant biological functions in the bovine reference genome,in agreement with the previously reported results.(3)Based on the second-generation high-throughput sequencing data of 898 cattle from 57 breeds(read depth >5×),a total of 4,285,821,838 sequences that were not successfully aligned with the reference genome were extracted,and 2,791,151 contigs were assembled to obtain a total of 543,702 sequence fragments by removing sequences below1000 bp and those classified as contaminants.A total of 38,980 representative sequences with a total length of about 74 Mb were identified by sequence similarity matching.A bovine pangenome representing the genomic information of these 898 cattle was constructed by combining the latest bovine reference genome(ARS-UCD1.2)as the framework and the missing non-reference genomic sequences on the bovine reference genome.These non-reference sequences accounted for 2.662% of the bovine pangenome.(4)One-end-anchor reads,Indirectly blast and EST sequences were used to locate 300,1256 and 23 non-reference sequences,respectively.170 protein coding genes,13 longstranded non-coding RNAs,7 pseudogene and 309 exons could be annotated by comparing the localization information with known annotation files.Of all 38,980 non-reference sequences,16,078(41.25%)sequences were predicted to have the ability to encode proteins and transcripts.In this research,starting from insertion variants in the genome,the distribution characteristics and biological effects of insertion variants in different cattle populations were analyzed,which fully illustrated the limitations and preferences of a single reference genome.On this basis,a cattle pan-genome representing 898 individuals was initially constructed,which provides necessary reference sequence information for genomic selection breeding and mining of genetic information in cattle breeding industry and even the treatment and prevention of diseases caused by structural variants.

Keywords/Search Tags:

insertional variation, population structure analysis, population differentiation index, pan-genome, gene annotation

PDF Full Text Request

Related items

1	Genome-wide Detection Of Selection Signatures In Sheep Populations With Use Of Population Differentiation Index F_ST
2	Analysis Of Mitochondrial Genome Sequences And Population Differentiation Of Several Invasion Leafminers
3	Genome Annotation And Comparative Analysis Of Zhenshan 97 And Minghui 63
4	Population Genetic Analysis Of Castor(Ricinus Communis) Based On Whole Genome Resequencing
5	Genetic Analysis Of Population Structure And Growth And Parapodia Of Apostichopus Japonicus Based On Genome Resequencing And GWAS
6	Analysis On Genetic Structure And Body Size Selection Signature Of Chinese Domestic Donkey Population Based On Whole Genome Sequencing
7	Study On Genetic Structure And Selection Of Domestication Of Brassica
8	Construction And Analysis Of T-DNA Insertional Mutation Population Of Lilium Longiflorum
9	Assessment of genetic variation and population differentiation in invasive multiflora rose, Rosa multiflora Thunberg (Rosaceae) in northeastern Ohio
10	Establishment Of Loop-mediated Isothermal Amplification Assay And Analysis Of Population Differentiation Of â€™Candidatus Liberibacter Asiaticusâ€™