| Masson pine(Pinus massoniana Lamb.),belonging to the Pinus of Pinaceae,is the main industrial timber tree species both for resin and timber production in South China with excellent characteristics of fast growing,high yield,high quality,drought and barren resistance,strong adaptability,wide use and high degree of comprehensive utilization.Conventional breeding based on phenotypic selection is characterized as long breeding cycle,low selection efficiency,time-consuming and laborious,while genomic selection(GS)is a potential early selection breeding technology based on genotypic selection by using genotype variation and has been applied broadly in crop and animal breeding.This approach is termed as a prospect forest breeding technology in shortening tree breeding cycle,improving breeding selection efficiency,accelerating new variety breeding and so on.In this study,Genotyping-by-sequencing was applied with 293 samples involved in 60 families from the progeny test stand of fine families in Fujian masson pine seed orchard and the de novo assembly was used to call SNPs for Masson pine GS breeding,which is the first report on GS breeding of Masson pine using GBS and the establishment of GS breeding technology system during the research exploration has paved a way in the construction of GS breeding platform for forest tree GS breeding in China.The main results are following:The genomic DNA of 293 samples needle of Masson pine was extracted and digested using two emzymes of EcoRV and SoaI,and then the sequencing library was constructed and sequenced using Illumina novaseq TM 6000 PE 150bp double.A total of 586 sequenced fastq files were obtained.After the removal of adapters and filtering out of the low-quality sequences,the average volume of clean data in a sequenced sample was 4.1 1 GB or so.The length of reads in clean data ranged from 40 bp to 137 bp,with an average of 135.07 bp;The GC content was ranged from 40.92%to 44.92%and the Q30 is ranged from 91.03%to 95.01%.Two assembly pipelines,named npGeno and MEGAHIT,were applied to perform the de novo assembly using different setting of parameters based on different number of sequenced sample files.The results showed that the assembly obtained according to different assembly strategies in two pipelines resulted varied numbers of contigs by using different setting of parameters.The numbers of contigs obtained using npGeno were ranged from 6256 to 269 951,while the numbers of contigs obtained using MEGAHIT were ranged from 1 810 243 to 486 003 127.The two original de novo assemblies obtained using npGeno and MEGAHIT based on the common 24 sequenced samples were blasted on the sequenced genome of loblolly pine(P.taeta)with the selection threshold of sequence blasting identity greater than 95%and two set of contigs with the number of 59393 and 843351 were obtained,respectively,reaching to the blasted genomic coverage of genome in sequenced loblolly pine as 3.28%(npGeno)and 5.17%(MEGAHIT)respectively.Taking three sets of assemblies,two sets of de novo assembly obtained from npGeno and MEGAHIT and a set of masson pine transcripts downloaded from NCBI,as reference for SNP calling using ref-ANGSD pipeline,three sets of clean SNP were obtained with the number of 859314(npGeno),26749890(MEGAHIT)and 9656901(PmRNAseqT-SNP),respectively.Increasing the filtering parameters without missing data and the no-common genotype frequency greater than 0.02 among the assessed 293 samples,three set of core SNPs were obtained with the number of 17213(npGeno),568124(MEGAHIT)and 69306(PmRNAseqT),respectively.Those contigs harboured core SNP from npGeno and MEGAHIT were blasted in the three public databases(NR,KEGG and GO)for sequence annotation and the results showed that the contigs annotation rates of the contigs based on npGeno assembly were 21.36%(NR),2.40%(KEGG)and 27.00%(go),respectively,and the contigs annotation rates of the contigs based on MEGAHIT assembly were 36.25%(NR),3.8%(KEGG)and 41.72%(GO)respectively.To explore the genetic prediction accuracy(GPA)of Masson pine GS breeding,three sets of traits(tree height,Diameter of Breast Height(DBH)and timber volume)were used as target traits in GS breeding prediction based on three sets of core SNPs implemented in the rrBLUP GS model using different setting of parameters.The results showed that the assessed three parameters,including the original of SNPs,the number of SNPs and the proportion of training population,all had a significant impact on the accuracy of genomic prediction in Masson pine GS breeding.In general,the GPA based on the SNP developed from Masson pine transcripts in GS breeding was higher than that of GPA based on the SNPs from the de novo assembly(npGeno or MEGAHIT).The timber volume GPA based on Masson pine transcripts-SNP obtained the highest value of 47.09%among all the assessed traits using different parameters in Masson pine GS breeding prediction accuracy in this study,while the other two sets of timber volume GPA were 23.29%(npGeno-SNP)and 45.24%(MEGAHIT-SNP),respectively.Increasing the proportion of the training population in Masson pine GS breeding,the GPA of the three target traits increased slightly with small fluctuation within each set of SNP from different original source.For example,the GPA of DBH fluctuated from 19.80~22.86%(npGeno-SNP),35.20~39.79%(MEGAHIT-SNP)and 44.14~46.27%(PmRNAseqT-SNP),respectively and the difference between GPA values mainly depended on the original source of SNP used in the GS breeding process.The number of SNPs used in the GS breeding prediction had a slight influence when the number of SNPs was lower than 3000(npGeno-SNP)or 5000(MEGAHIT-SNP or PmRNAseqT-SNP),as the GPA were increased slowly before the threshold of 3000 or 5000 SNP while intended to a set of stabilized values,respectively within each set of SNP exceeded the number of threshold above.Further more,the GPA of tree height,DBH and timber volume in Masson pine GS breeding based on different sets of SNP within different SNP original source were obtained with the tree height GPA values ranged from-0.09~20.06%(npGeno-SNP),13.18~43.69%(MEGAHIT-SNP)and 13.18~45.11%(PmRNAseqT-SNP);the DBH GPA values ranged from 0.27~23.29%(npGeno-SNP),13.44~45.33%(MEGAHIT-SNP)and 13.87%~44.03%(PmRNAseqT-SNP)respectively;and the timber volume GPA values ranged from-1.16%~21.12%(npGeno-SNP),14.77%~45.24%(MEGAHIT-SNP)and 14.76~47.09%(PmRNAseqT-SNP)respectively.In conclusion,the SNP come from the reference based on mRNA sequencing has a higher prediction accuracy in the GS Masson Pine breeding and it is necessary to develop the forest GS breeding platform based on the SNP from gene expression region,ie.RNA-seq. |