| Objective:In forensic practice,we usually encounter difficult cases such as missing person identification,complex kinship analysis and so on.Solving these kind of cases requires detecting much more genetic markers and acquiring more genetic informations.Short tandem repeats(STR)has been used as the dominant genetic marker in forensic DNA analysis nearly 35 years.Using capillary electrophoresis(CE)technology,more than 20 STRs can be co-amplified in a single multiplex system.However,it cannot yet meet the practical requirements for some difficult cases.The next generation sequencing(NGS)technique,with the ability of high-throughput,can detect more loci in a single sequencing reaction,and obtain simultaneously both length and sequence information of loci,increasing the effective allele numbers.Therefore,it is a better choice to solve the above-mentioned difficult issues.In this study,42 autosomal STR loci and Amelogenin commonly used in current forensic DNA testing were selected to construct the NGS-STR typing system,and sequenced based on RUO mode of Illumina MiSeq FGxTMM platform to establish the analysis module and method of sequencing data.At the same time,the NGS-STR typing system was evaluated in the laboratory including accuracy,consistency,repeatability,sensitivity and mixture analysis.It provides new data for the development of NGS-STR typing technology and a new detection scheme for difficult cases that require combined application of multiple kits.Methods:1.Construction of NGS-STR typing system:58 DNA samples of healthy individuals were selected from the biological sample bank of our research group,and 2800 Control DNA was used as the positive control standard.DNA quantification was conducted using QubitTM dsDNA HS Assay Kit,and DNA purity was detected by Nano-QTM micro-spectrophotometer.A total of 42autosomal STR loci and Amelogenin(D1S1656,CSF1PO,D10S1248,D10S1435,D11S2368,D12S391,D13S317,D13S325,D14S608,D15S659,D16S539,D17S1290,D18S51,D18S535,D19S253,D19S433,D20S470,D21S11,D21S1270,GATA198B05,D2S1338,D2S441,D3S1358,D3S1744,D3S3045,D4S2366,D5S2500,D5S818,D6S1043,D6S477,D7S1517,D7S3048,D7S820,D8S1132,D8S1179,D9S925,FGA,PentaD,PentaE,TH01,TPOX,vWA,Amelogenin)commonly used in forensic medicine were selected,and the NGS-STR typing system was synthesized using Qiagen’s molecular barcode sequencing and single-end specific primer extension technology.Library was constructed according to the instructions of QIAseq Targeted DNA Custom Panel.The starting template quantity of DNA was 20ng and the purity was 1.82.0.According to the manual of MiSeq sequencing kit,the library was used to homogenize,denature and dilute before sequencing.Illumina Experiment Manager(IEM)was used to set the parameters of sequencing,and MiSeq FGxTM RUO mode was used for sequencing.2.Sequencing data analysis:Using Linux system and hg19 version reference genome for sequence comparison,specific sequences with lengths of310bp at both ends of STR were selected for matching and screening of target region sequence information,and the specific matching screening criteria of each STR locus were personalized.The nomenclature guidelines of the International Society for Forensic Genetics was recommended to make it compatible with the CE typing.3.Forensic application evaluation of NGS-STR typing system:Sequencing of the same library was compared under different parameters(PE150 and PE300).Core sequences of 42 STR loci and Amelogenin in 2800Control DNA were analyzed and verified to evaluate the accuracy of typing system.The results of NGS-STR and CE-STR typing of 58 samples were compared to evaluate their consistency.The library was constructed with 2800Control DNA for triplicate to study the repeatability of sequencing results.2800 Control DNA was used to construct the library with different starting DNA template amounts(10 ng,5 ng,2.5 ng,1.25 ng and 0.625 ng)in sensitivity studies.Two samples were used to construct the library with different mixing ratios(1:1,1:2,1:4 and 1:9)in mixture studies.Results:1.Quality assessment of the experimental processThe results of fragment quality inspection of library show that the library has neither small fragment nor large trailing peak,which is consistent with the expected peak diagram.Molar concentration quality inspection shows that the CT value of the blank control hole is>29,the slope of the standard curve ranges from-3.1 to-3.5,the standard deviation of the multiple holes<0.4,and the amplification efficiency is 90%110%.In this study,the mean value of Q30,cluster density and Cluster Passing Filter of the main Quality control indexes are 85.47%,1007.33 K/mm2 and91.5%,all of which meet the officially recognized available results.The STR typing of the same library using PE150 and PE300 are exactly identical.With the increase of pair-end sequencing read length,the available data of sequencing increases relatively while other things being equal.2.Laboratory evaluation of NGS-STR NGS-STR typing systemThe vast majority of noise can be filtered out and correctly typed at 5%threshold.The statistics of data quality shows as follows:the average DoC of all samples is 14720×;the average DoC of all loci is 2688×;%allele,%stutter and%Noise is respectively 97.01%,2.64%and 0.35%for sequence constituent proportion;ACR of all loci is averaged to 0.741.3.Accuracy and compatibility studiesThe ForenSeqTM DNA Signature Prep Kit and Sanger sequencing results are consistent with the core sequence information of all 2800 Control DNA loci detected by NGS-STR typing system.The results of 2800 Control DNA sequencing shows that there is incompatibility between the NGS-STR and CE-STR typing nomenclature of 9 autosomal STR loci.According to the results of sequence alignment,the repeat number of core sequence about 9autosomal STR loci is corrected and a new nomenclature strategy is proposed.According to the new nomenclature strategy,2494 loci of 58 samples in this experiment are compared between NGS-STR typing and CE-STR typing.The results show that 2487(99.72%)loci are consistent.At the same time,191isoalleles are observed in 23 loci.4.Repeatability and sensitivity studiesAllele genotyping is consistent in the three repeated experiments,and there is no significant difference in%Allele and ACR by statistical test,indicating that the NGS-STR typing system has good repeatability.At the 5%analysis threshold,the loci can be accurately typed when the initial DNA template amount reduces to 2.5 ng.When the initial template is down to 1.25 ng,the sequencing depth and%Allele show a sharp decrease,and the%Stutter and%Noise show an inversely increasing trend.Analysing ACR of different starting template,the coefficient of variation(20.8%)is relatively small in 2.5 ng DNA template amount.When the initial DNA input decreases to 0.625 ng,the coefficient of variation goes up to 41.1%,and the dispersion degree between loci increases.5.Mixture analysisAt 5%analysis threshold,all of alleles are not lost in the mixture ratio of1:1 and 1:2;low component allele loss occurres in mixture ratios of 1:4 and1:9.Moreover,with the increase of the mixing ratio,the number of alleles and the detection rate decrease,and the number of alleles lost increases.6.Population genetic analysesPopulation genetic of 58 unrelated individuals are analyzed.Allele frequencies of all loci are in accordance with Hardy-Weinberg equilibrium and all loci are in linkage equilibrium.We compared the heterozygosity,power of discrimination and polymorphism information content obtained by length and sequence.The result shows that the sequence polymorphism detected by NGS had greater forensic application value than the length polymorphism detected by CE.Conclusions:In this study,we construct NGS-STR typing system including 42autosomal STR loci and Amelogenin,sequencing data analysis pipelines and nomenclature strategies.This typing system performs high accuracy,sensitivity,repeatability,and shows potential application values for mixed sample identification,which provides new data for the development of NGS-STR typing technology and a new detection scheme for difficult cases that require combined application of multiple kits. |