Background and purposeA large part of the human Y chromosome,the non-recombinant Y chromosome NRY,follows the rule of strict paternal inheritance.Since this portion of the Y chromosome is not recombined during meiosis,the rank and order of all human NRY variants can be determined,and their order and timing of pedigrees in the phylogenetic tree can be inferred.However,in the transmission process of genetic information from father to son for generations,the Y chromosome is gradually accumulating some changes-genetic mutations.It is precisely because of the accumulation of genetic mutations that the difference on Y chromosome between two individuals in the system of human paternal inheritance is greater;it is the accumulation of these genetic mutations that form the Y chromosome genetic markers.There are many genetic markers on the Y chromosome,mainly including microsatellite DNA,small satellite DNA,satellite DNA,insertions and deletions,and single nucleotide polymorphisms(SNP).Y chromosomal microsatellite DNA and SNP,namely Y-STR and Y-SNP,are currently the most commonly used in forensic research and in actual cases.Moreover,with the rapid development of Next Generation sequencing(NGS)and the reduction in sequencing costs,a large amount of human genome data will be available in the next few years.These data can be used to optimize and improve the resolution of the Y chromosomal phylogenetic tree.However,the amount of data generated by NGS has increased dramatically,and the format of the data generated is more complex,which poses a severe challenge to the need of extracting the genetic information of a certain locus in forensic practice.In order to effectively analyze and interpret Y-chromosome NGS data,our study analyzes the whole-genome DNA second-generation sequencing data with three softwares,STRait Razor v3,AMY-tree and Y-leaf,so as to compare the effectiveness of each software,and to provide reference for the methods on extracting Y-STR information and high-resolution Y-SNP haplotypes from the sequencing results,and to provide help for the application of the second-generation sequencing to real forensic casework..Methods1.Genomic DNA was extracted from 2ml peripheral blood of a Henan Han male with the blood genomic DNA extraction kit of Shanghai Laifeng Company.2.The whole genome DNA sequence of the sample was re-sequenced by the second generation sequencer BGIS EQ-500.3.Extracting the Y-STR information from the second generation sequencing data: use the STRait Razor v3 software package.On the Windows operation system,downloading and installing the STRait Razor v3 package from the web page at https://github.com/Ahhgust/STRaitRazor.The.fastq file of the second-generation sequencing results is used to extract the information of Y chromosomal STRs.4.Extracting specific Y-SNP information from the second generation sequencing results: using AMY-tree software;and determining the branch attribution of the sample in the Y haploid evolutionary tree(International Society of Genetic Genealogy,ISOGG,http://www.isogg.org/tree).Downloading and installing AMY-tree software on Windows operating system from bio.kuleuven.be/eeb/lbeg.The list of Y chromosome mutations from the re-sequencing results was modified into a qualified inputting file according to the format required by AMY-tree software,and inputting other supporting files needed by the software,then running the AMY-tree to locate the sample among the branches of ISOGG haplogroups and to find the latest information of potential Y-SNP.5.Extracting specific Y-SNP information from the second-generation sequencing results: using Yleaf software;and determining the branch attribution of the sample in the Y haploid evolutionary tree(International Society of Genetic Genealogy,ISOGG,http://www.isogg.org/tree).Downloading the Yleaf software at https://www6.erasmusmc.nl/genetic_identification/resources/Yleaf/ and installing it on the Linux operating system according to its README file.Installing basic supporting packages such as Python,wget,Libcurl,readline,R,and samtools in advance.Then run y_leaf on the Super-calculating Center of Zhengzhou University to analyze the.bam or.fastq files from the second-generation sequencing data,and attributing Y-SNPs to the proper haplogroups.6.The sample was tested with a Yfiler kit;the sample was also subjected to M117 typing to verify the extracted information from the above software.7.By comparing the number of Y-STR identified by STRAIT Razor V3 with the number of Y-STR detected by Yfile kit,the Y-STR detection rate of Y-STR in 50×whole genome sequencing was obtained;by dividing the number of software-identified SNPs on Y-haploid evolutionary tree into the total number of Y-SNPs input into the software,the recognition rate of Y-SNP was calculated;by dividing the number of software-identified SNPs on Y-haploid evolutionary tree into the total number of Y-SNPs input into the software,the recognition rate of Y-leaf was obtained,the recognition rate of Y-leaf is calculated.Chi-square test was used to compare the efficiency difference between AMY-tree and Yleaf.Result1.Whole genome resequencing yielded 3429964 SNPs,of which 98.77% appeared in the dbSNP database,of which 96.98% were in the 1000 Genomes Project database.A total of 32050 new SNPs were discovered in the whole genome.2825 SNPs were obtained on the Y chromosome.2.The STR locus and gender information analyzed by the Powerseq.config component of the STRait Razor v3 package showed that 48,742 gender information(Amelogenin gene)and 236 STRs typing results were obtained,of which 49 were Y-STR information(autosomal STR: 187).Six Y-STRs read out by the software and their typing results are basically consistent with the results of Yfile kit.3.The AMY-tree analysis result showed that the haplogroup is O2a2c2c* [O-Page23*],which is not completely consistent with the electrophoresis detection result of the sample,which is M117 derived type,indicating that the sample belongs to the haplogroup phylogenetic tree branch O2a2b1a1,by electrophoresis.This is because the AMY-tree database is still the old 2014 version,which is not up to date.4.Yleaf output results demonstrated a total of 41392 Y-SNPs on the Y chromosome,among them 908 Y-SNPs are derived,indicating that the sample belongs to the O2a2b1a1a1a1 a branch.Yleaf results is more specific than the electrophoresis results of the sample(M117 derived,indicating that the sample belongs to the haplogroup phylogenetic tree branch O2a2b1a1).5.The detection/recognition rates of Y-DNA genetic marker information extracted by STRAIT Razor v3,AMY-tree and Yleaf were 35%,73% and 99%,respectively,in the 50 x genome sequencing data.AMY-tree and Yleaf have significant difference in the recognition rates on Y-SNPs,while Yleaf is more accurate.Conclusion1.The STRait Razor v3 software package extracts relevant Y-STR and even autosomal STR information directly from genome-wide sequencing results.And its working platform can be Windows operating system,easy to install,reliable operation,and can be used routinely by the current forensic DNA laboratory.2.Y-leaf is a software for accurate,high-resolution haplogroup inference from all types of Y-chromosome NGS data.3.STR typing based on next-generation sequencing Based on the existing data output method,the STR locus is fully resolved,further focusing on the sequence polymorphism within the STR,and significantly improving the individual recognition ability of the STR locus.The technical route for STR typing based on NGS technology is feasible.4.Comparing the effectiveness of each software,it can provide a reference for extracting Y-STR information from the second-generation sequencing results and improving the Y-SNP haplogroup resolution.It is helpful to use the second-generation sequencing technology to obtain Y-DNA genetic markers in forensic evidence work.5.Conventional genome-wide sequencing results are difficult to provide forensic DNA genetic markers in full.The second-generation sequencing technology for forensic purposes should be different from conventional genome-wide sequencing or full-exome sequencing. |