Font Size: a A A

A Study Of High-resolution Y-SNP-STR And Mitogenome With Next-generation Sequencing For Forensic Genetics

Posted on:2022-06-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:M G WangFull Text:PDF
GTID:1524306551963099Subject:Forensic medicine
Abstract/Summary:PDF Full Text Request
Objective:The non-recombining part of the human Y chromosome(NRY)is widely used in forensic investigations,particularly in cases where standard autosomal DNA profiling is not informative.Male relatives typically share an identical Y-chromosomal short tandem repeat(Y-STR)haplotype,so forensic pedigree searches can be carried out by demonstrating Y-STR haplotype matches.However,due to the relatively high mutation rates of Y-STR loci(approximately 1.0×10-4-1.0×10-2 mutation/generation),hence as more Y-STRs are employed,the probability of encountering any mutation increases.Furthermore,Y-STR haplotype matching can only show that individuals are identical by state(IBS),but cannot confirm that they are identical by descent(IBD),which indicates that individuals with consistent Y-STR haplotypes may not be relatives.The haplogroup is a group of similar haplotypes that share a common ancestor with a single nucleotide polymorphism(SNP)mutation,male relatives are supposed to share identical haplogroups.Since the mutation rates of Y-SNPs are relatively lower(approximately 1.0×10-9substitution/generation),pedigree signatures are kept much longer at Y-SNPs than Y-STRs,so Y-SNPs can be utilized to trace back the human origin and search paternal lineage.Human mitochondrial DNA(mt DNA)is present in hundreds to thousands of copies per cell and transmits as a non-recombing unit through matrilineal inheritance.Sequence mutations in the mitochondrial genome(mitogenome)are accumulated sequentially,and therefore,based on the time horizon of these mutations,human mt DNA contains the molecular recordation of genealogical history.The mt DNA haplogroups also present geographical or population specificity,and have potential to aid forensic scientists in performing forensic pedigree searches.To explore the practical value of Y-chromosomal genetic markers and mt DNA in the field of forensic genetics,this study will focus on the following questions:1)Construct a high-resolution Y-SNP panel based on next-generation sequencing(NGS)technology and build a Chinese population high-resolution phylogenetic tree.Then validate the sequencing efficiency and practical value of the NGS Y-SNP panel as well as the resolution of the Chinese population high-resolution phylogenetic tree;2)Use the SNa Pshot-based Y-SNP panels to genotype Mongolian,Tibetan,Yi and Sherpa individuals in China.According to the preliminary genotyping results,new Y-SNP loci are screened to construct SNa Pshot-based Y-SNP panels and update the Chinese population high-resolution phylogenetic tree;3)Use the newly constructed Y-SNP panels to subdivide the above populations and identify the main paternal haplogroups.Combined use of Y-STRs to explore the paternal genetic structure of the studied populations and investigate the application value of Y-SNP-STR in forensic pedigree searches;4)Use mitogenomes to explore the maternal genetic structure of Chinese Mongolian,Tibetan,Sherpa,Mosuo and Li populations,and identify the main maternal lineages of the studied populations;5)Compare the distribution differences of paternal and maternal haplogroups in the same population,and explore the application value of mt DNA in forensic pedigree searches.Methods:The International Society of Genetic Genealogy(ISOGG)Y-chromosome tree resource version 2017,Y Chromosome Haplotype Reference Database(YHRD),and the early research results of our laboratory were referred for Y-SNP screening.The inclusion criteria for Y-SNP screening were as follows:1)Y-SNPs of the major haplogroups in Chinese populations;2)Polymorphic in Chinese populations,and haplogroup frequency of the terminal branch should be less than 5%in the studied populations;3)Phylogenetically key intermediate Y-SNPs that increase the resolution of the Chinese high-resolution tree;4)Stable Y-SNPs without back mutation;and 5)Primers could be designed for NGS.Sequence targets were submitted to the Thermo Fisher Scientific Ion Ampli Seq Designer(http://www.ampliseq.com)for primer design.Finally,the Chinese population high-resolution phylogenetic tree and a NGS Y-SNP panel were initially constructed,and then the sequencing performance of the custom Y-SNP panel was validated through repeatability,stability,consistency and sensitivity testing.To improve the population coverage and resolution of the Chinese population high-resolution phylogenetic tree,we first used Y-SNP panels built in previous studies to genotype 240 Hohhot Mongolians,226 Hulunbuir Mongolians,213 Ordos Mongolians,101 Muli Tibetans,95 Chengdu Tibetans,58 Qinghai Tibetans,104 Liangshan Yis,and161 Dingjie Sherpas.Y-SNPs were then screened according to the preliminary genotyping results and the phylogeny provided by the ISOGG and YHRD websites.The inclusion criteria for Y-SNP screening were as follows:1)Y-SNPs belong to the sub-haplogroups of C-M130,N1b-F2930,Q-M242,and O2a2b1a1a1a4a-CTS4658.The number of downstream branches included in the system was determined by the haplogroup frequencies in the Chinese populations.If a certain haplogroup accounts for a higher proportion in the Chinese populations,the more downstream branches and corresponding Y-SNPs it contains;2)Stable Y-SNPs without reverse mutation;3)Phylogenetically key intermediate Y-SNPs which improve the integrity of the Chinese high-resolution tree;4)Haplogroup frequency of the terminal branch should be less than 5%in the studied populations;and 5)PCR primers and SBE(single base extension)primers could be designed for SNa Pshot.The eligible Y-SNPs were screened out according to the above criteria,and finally,stable Y-SNP panels based on SNa Pshot typing technology were constructed.Subsequently,above-mentioned Mongolians,Tibetans,Yis and Sherpas were further genotyped using Y-SNP panels built in the present study.At the same time,all samples were genotyped using the AGCU Y37 kit.Haplotypes and haplogroups were assigned and forensically statistical parameters were calculated according to the Y-STR and Y-SNP haplotypes.In order to explore the genetic affinity and difference between target populations and reference populations,we conducted genetic distance estimation,multidimensional scaling analysis(MDS),phylogenetic analysis,and analysis of molecular variance(AMOVA)based on Y-STR and Y-SNP profiles.Moreover,we carried out haplogroup-based principal component analysis(PCA)toinvestigategeneticrelationshipsamong geographically/linguistically/ethnically different populations.The median-joining(MJ)networks for major haplogroups in studied populations were constructed respectively by the Network 10.1 and visualized in the Network Publisher.Mitotypes of 41 Hohhot Mongolians,45 Hulunbuir Mongolians,45 Ordos Mongolians,119 Muli Tibetans,127 Hainan Lis,35 Yanyuan Mosuos and 178 Dingjie Sherpas were genotyped.The mitogenome amplification reactions were performed using the Precision ID mt DNA Whole Genome Panel and the Ion S5 XL Sequencer with the“conservative”method according to the manufacturer’s instructions.All sequencing data was processed using the Torrent Suite Software V5.10.0,Varianter V5.10.0.18 plugin and HIDGenotyper V2.01 plugin.The haplogroup assignment of mitogenomes generated in this study was carried out using Haplo Grep 2 based on the Phylo Tree build 17and reconfirmed using the updated query engine(SAM2)built into EMPOP.Molecular diversity indices,including the number of polymorphic sites,the mean number of pairwise differences(MNPD),nucleotide diversity values,Tajima’s D and Fu’s Fs neutrality tests,were evaluated using the Arlequin V3.5.2.2.In order to explore the distribution characteristics of haplogroups and genetic relationships among geographically/linguistically/ethnically different populations,genetic distance estimation,MDS,AMOVA and MJ network structure analyses were performed based on mitotypes,and PCA analysis was performed based on mitochondrial haplogroup data.The time to the most recent common ancestor(TMRCA)of the prevailing haplogroups was estimated using the rho statistic-based method and the calibration rate for complete mt DNA sequences.Results:A total of 170 Y-SNPs were initially screened out,and primers of 166 Y-SNPs were successfully designed.One Y-SNP(F2887-O2a2b1a2a1a3)was subsequently excluded during data analysis because of insufficient amplification.The final NGS Y-SNP panel includes 165 Y-SNPs,which together define 160 different paternal haplogroups.The sequencing results generated in this study demonstrated that the 165-plex NGS Y-SNP panel performed well and could acquire achievable high-resolution.The custom NGS Y-SNP panel offers a straightforward sample-to-haplogroup workflow which would be beneficial for paternal lineage classification and forensic pedigree searches.Based on the pre-experimental results of Y-SNP haplotypes/haplogroups of 679 Mongolians,254 Tibetans,104 Yis and 161Sherpas and phylogenetic trees provided by ISOGG and YHRD,58Y-SNPs belonging to the sub-haplogroups of C-M130,O2a2b1a1a1a4a-CTS4658,N1b-F2930 and Q-M242 were selected,and then the C-M130 high-resolution Y-SNP panel containing 28Y-SNPs and a complementary high-resolution Y-SNP panel containing30 Y-SNPs were constructed.Subsequently,a Chinese population high-resolution phylogenetic tree containing 215 Y-SNP loci was built.The results of panel validation showed that these two high-resolution Y-SNP panels could produce accurate genotypes and possessed strong stability and repeatability.Among the 1198 unrelated male individuals analyzed in this study,1118 different Y-STR haplotypes were observed,of which two haplotypes were shared by individuals from different populations.The values of haplotype diversity(HD)were in the range of 0.9978(Muli Tibetan)to 1.0000(Qinghai Tibetan and Liangshan Yi).A total of 292alleles were observed in 31 single-copy Y-STR loci,and the number of allelic combinations of three multi-copy loci(DYS527,DYS385 and DYF387S1)was 62,74 and 49,respectively.Additionally,34 null alleles,48 intermediate alleles and 68 copy number variants were observed.The combined analysis of Y-STR and Y-SNP profiles showed that the samples with DYS448 deletion were associated with haplogroup C2a1a1b1-F1756,the samples with intermediate alleles at DYS518 were associated with haplogroup Q-M242,the samples with copy number variants at DYS19 were associated with haplogroup C2a1a2a-M86,and samples with copy number variants at DYF387S1were associated with haplogroup C2-M217.The results of gene diversity(GD)demonstrated that three multi-copy loci showed the highest GD values,followed by single-copy rapidly-mutating Y-STRs(RM Y-STRs).As for the haplogroup distribution,a total of 93 haplogroups were detected in 679 Mongolian individuals,44 were detected in 254Tibetan individuals,33 were detected in 104 Yi individuals,and only 7haplogroups were detected in 161 Sherpa individuals.Haplogroup C2-M217(34.46%)was the predominant haplogroup in Mongolian populations,followed by O2-M122(30.34%)and N-M231(11.49%);Haplogroup D1-M174(55.51%)was the predominant haplogroup in Tibetan populations,followed by O2-M122(24.02%);Haplogroup O2-M122(38.46%)was the predominant haplogroup in Liangshan Yi,followed by D1-M174(19.23%),O1b-P31(13.46%),N-M231(12.50%)and O1a-M119(6.73%);And O2-M122(98.14%)was also the predominant haplogroup in Dingjie Sherpa.Additionally,we also observed haplogroups DE-M145,D1-M174,C1-F3393,G-M201,I-M170,J-M304,L-M20,O1a-M119,Q-M242 and R-M207 in Mongolian populations,haplogroups DE-M145,C2-M217,G-M201,J-M304,LT-P326,L-M20,N-M231,O1a-M119,O1b-P31,R-M207and Q-M242 in Tibetan populations,haplogroups C-M130,F2-M427and R-M207 in Liangshan Yi,and haplogroups C2b1a2a-F1319,D1a1b1a2~-PH97 and O1b1a2a-F993 in Dingjie Sherpa.In general,the distribution of Y-chromosomal haplogroups showed significant difference among the studied populations.The genetic relationships revealed by pairwise genetic distances,MDS,AMOVA,PCA,and phylogenetic trees showed that corresponding analyses conducted based on Y-SNP haplotype/haplogroup data could better reflect the genetic structure of the studied populations.We did not find population substructure among Mongolian groups,and studied Mongolian populations showed a relatively closer genetic affinity with Xinjiang Hui,Gansu Hui and Qinghai Hui.Muli Tibetan and Chengdu Tibetan showed a relatively closer genetic affinity with(?)-Tsang Tibetan,Qinghai Tibetan showed a relatively closer genetic affinity with Kham Tibetan,Liangshan Yi showed a relatively closer genetic affinity with Sichuan Hui,Shaanxi Hui and Henan Hui.However,Dingjie Sherpa showed distant genetic relationships with other populations.A total of 590 unrelated individuals from Muli Tibetan,Hainan Li,three Mongolian populations,Yanyuan Mosuo and Dingjie Sherpa were sequenced for the whole mitogenome,and 518 different mitotypes were observed.The values of haplotype diversity were0.9973,0.9988,1.0000,1.0000 and 0.9919,respectively.A total of 49point heteroplasmies(PHPs)were observed in the present study,the PHPs located in the control region could be found in the EMPOP database except 210R,while the PHPs located in the coding region could not be found in the EMPOP database except 1692R,5147R,7642R,9947R and 11778R.The length heteroplasmies(LHPs)were usually observed in regions nt303-nt315,nt16184-nt16193 and dimeric repeat region nt515-nt524,besides,six unusual LHPs were also detected:356.1C,597.1T,2135.1A,4315.1T,12241del and16263.1A.Among all observed heteroplasmy sites,only 16263.1A may be related to haplogroup J1b1b,while other heteroplasmies were randomly generated and not related to specific haplogroups.The haplogroups of 590 mitotypes were obtained based on the Phylo Tree build 17,and a total of 202 different haplogroups were detected.The prevailing haplogroups in Muli Tibetan were G,A,F,D,M9 and B,in Hainan Li were M7,B,F,M12 and D,in three Mongolian groups were D,B,F,C,G and N9,in Yanyuan Mosuo were F,C,D,B and M9,and in Dingjie Sherpa were M9,A,R32,D,C and F.The above haplogroups except R32 are common in East Asians.Additionally,we also observed haplogroups C,M7,M11,M13,T,Y and Z in Muli Tibetan,haplogroups A,G,M8,M9,N9 and R9 in Hainan Li,haplogroups A,H,HV,I,J,M7,M8,M9,N1,R9,R11,T,U,Y and Z in three Mongolian groups,haplogroups A,G,M13,M49and Z in Yanyuan Mosuo,and haplogroups G,M13 and M62 in Dingjie Sherpa.Haplogroups HV,H,I,J,M49,N1,R11,T,U,and Y with low frequencies in the above studied populations are not East Asia high-resolution matrilineal haplogroups.The MJ networks of A15c1 and M9a1a1c1b1a showed that Sherpas located in the downstream of Tibetan individuals,which suggested that the maternal gene pool of Sherpas may derive from Tibetan populations.In addition,the MJ network of R32 showed that Sherpas located in the downstream of Gujarati Indians,which indicated that Dingjie Sherpa possessed considerable South Asian genetic component.The gene flow from the Indian subcontinent towards the Sherpa provided supporting evidence for the gene flow from the southern foot of the Himalayas to the Tibet Plateau.The comparison of Y-chromosomal and mitochondrial haplotypes/haplogroups in the same population showed that the diversity of mitochondrial haplogroups was relatively high,while the specificity of Y-chromosomal haplogroups was relatively high,Y-chromosomal genetic markers are more suitable for forensic pedigree searches.We also found that individuals with the same Y-SNP-STR haplotypes possess distinct mitotypes.Therefore,when conducting forensic pedigree searches using Y-SNP-STR,the genotyping and matching of mitogenomes can assist forensic scientists in confirming or eliminating suspicious pedigrees.Conclusions:In the present study,we designed a high-resolution NGS Y-SNP panel containing 165 Y-SNPs for forensic pedigree searches of Chinese populations,which possessed high stability,strong reproducibility,and could obtain accurate sequencing results.This NGS Y-SNP panel could obtain high haplogroup resolution in Sinitic,Tibeto-Burman,Turkic,and Tai-Kadai populations of China.We constructed a high-resolution C-M130 Y-SNP SNa Pshot panel containing 28 Y-SNP loci and a high-resolution N/O/Q complementary Y-SNP SNa Pshot panel containing 30 Y-SNP loci.Furthermore,our phylogenetic tree was upgraded to a Chinese population high-resolution phylogenetic tree containing 215 Y-SNP loci.The typing results of 37 Y-STRs and 215 Y-SNPs of 1198unrelated male individuals in Chinese Mongolic and Tibeto-Burman populations provided reference data for forensic pedigree searches.The genetic diversity results of 37 Y-STRs indicated that detected allelic variants were associated with specific haplogroups,which provided a theoretical basis for inferring haplogroups with Y-STR haplotypes.The results of genetic relationships showed that the fine paternal genetic structure of Chinese populations was highly consistent with geographical division and language classification.The combined analysis results of Y-SNP-STR showed that the haplotypes within the same haplogroup in the Chinese populations had a high degree of heterogeneity,but the same haplotype tended to have the same haplogroup.Compared with mitotypes of CR or HVR regions,mitogenome sequencing allowed us to observe more genetic information,which could also provide more accurate and more detailed maternal haplogroup classification results.The results of population genetic relationship analysis showed that there was no obvious correlation between the fine maternal genetic structure of Chinese populations and geographical division and language classification.The distribution characteristics of Y-chromosomal and mitochondrial haplogroups in the same population revealed sex-biased genetic admixture history in the same population.The results indicated that typing high-resolution Y-chromosomal genetic markers and mitogenome simultaneously on biological samples at the crime scene would help exclude suspicious families or individuals in forensic pedigree searches and individual identification.
Keywords/Search Tags:Forensic genetics, Forensic pedigree search, Y-SNP-STR, Mitochondrial genome, Haplogroup distribution, Genetic structure
PDF Full Text Request
Related items