| Background : DNA profiling with sets of highly polymorphic autosomal short tandem repeat(STR)markers has now been applied in various aspects of human identification in forensic investigations.Autosomal STR profiles are generated from biological materials found at crime or disaster scenes and compared with profiles of known suspects identified by police investigations or included in national forensic DNA databases.A profile match provides strong evidence for individual identification and provides clues for police investigations.Whereas a mismatch will happen with the growth of population and the limit of databases,then the traditional investigation methods can not identify the suspects anymore.Investigate more information from biological materials found at crime scenes has now been a new challenge in forensic research.Forensic ancestry inference from DNA will provide clues for determining the investigation direction and narrowing the investigation scope.Ancestry informative markers(AIM)are markers that show strong allele frequency differences between populations from different geographic regions.Current studies on AIM are mostly based on the third generation genetic marker of SNP or In Del.In addition to the traditional AIM-SNP/In Del genetic markers,a new genetic marker consisting of linked SNPs,microhaplotypes,has become an important part of forensic ancestry inference research in recent years.Microhaplotype(MH)are new genetic markers with two or more closely linked SNP within 300 bp fragment of DNA.There are at least three or more haplotypes in MH.Microhaplotypes have the advantages of both STR and SNP,while avoiding the disadvantages of both genetic markers.With the advantages of shorter amplified fragments,better polymorphism,lower mutation rate,and avoid the disadvantages of dominant amplification and stutter peak,MH have great application potential in forensic genetics.With the development of sequencing technology,massively parallel sequencing(MPS)has become the most common method to detect MH,but it is still difficult to apply it in forensic genetics laboratories.Therefore,it’s necessary to explore the economic,convenient and sensitive detection technology for forensic practice.Although researchers have screened a large number of AIM based on a variety of genetic markers,and carried out many biogeographic ancestry inference studies,there is still a lack of global population reference data sets with high coverage.And many AIM sets have been developed for inference biogeographic ancestry at the level of continent.China is a large multi-ethnic country with vast territory,rich landforms and complex population structure.When AIM sets for intercontinental groups are applied to Chinese population,the ancestry inference resolution will be obviously limited.There is an urgent need to develop specific ancestry informative genetic markers for Chinese and East Asian populations.And further to supply the ancestry information reference data of Chinese and East Asian populations.Objective: The study aims to screen new microhaplotype genetic markers for inference biogeographic ancestry in China.All the microhaplotype genetic markers screened in this study were SNP-SNP genetic markers.To explore a relatively economical,convenient and sensitive detection method for microhaplotype genetic marker,and to establish the corresponding detection systems.Samples from different populations were detected and genotyped,and population genetic analysis were conducted based on the studied population data and global population data to evaluate the effectiveness of the system for the biogeographic ancestry inference.In the hope that it can be used in police investigation via provide clues for investigation.And to explore the fine genetic structure of Chinese populations and provide ancestry information reference data of Chinese population.Methods: Bioinformatics technology was conducted to preliminarily screen microhaplotypes based on East Asian population data from the Phase III data of 1000 Genomes database and constructed a SNaPshot preliminary detection system.A small number of samples were first tested to verify the validity of the detection system and the validity of the ancestry inference of the genetic markers.220 unrelated random individuals from 11 populations(20 Chengdu Hans,20 Muli Tibetans,20 Dujiangyan Tibetans,20 Xichang Yis,20 Zunyi Gelaos,20 Wuzhong Huis,20 Hainan Lis,20 Hainan Hans,20 Ordes Mongolians,20 Kumul Uyghurs and 20 Tibetan Sherpas)were typed for the individual SNPs using SNa Pshot,and phased using the program PHSAE.On this basis,these genetic markers were screened again and the corresponding final SNa Pshot compound detection system were established.A next generation sequencing detection system based on the Ion S5 XL sequencing platform was constructed to assess the consistency of the results using SNa Pshot detection system and PHASE program.A total of 828 unrelated individuals from 11 populations were genotyped using the final SNa Pshot detection system for further population genetic analysis to validate the ancestry inference efficiency of the 21 MH markers.Based on the 11 experiment populations data and the 37 global populations data,the In and population pairwise genetic distance were calculated,principal component analysis,multi-dimensional scale analysis,phylogenetic analysis and population ancestry component analysis in STRUCTURE were conducted to evaluate the efficiency of ancestry inference.Ancestry affiliation prediction using SNIPPER was also performed.Results: Through bioinformatics analysis of East Asian population data from the Phase III data of 1000 Genomes database,44 candidate genetic markers were preliminarily screened out.Seven SNaPshot preliminary detecting systems were constructed.Small number of samples of 220 individuals from 11 populations were genotyped for the individual SNPs using SNa Pshot,and phased using the program PHSAE.After analyzed the data and deleted the genetic markers that could not be effectively detected and had little difference in allele frequency distribution among experimental populations,21 microhaplotype genetic markers that suitable for ancestry inference in Chinese population were finally obtained.Two multiplex SNa Pshot detection systems were successfully constructed.Validation tests were conducted and the results suggested that the two multiplex SNa Pshot detection systems had high sensitivity,accuracy and stability.For the 21 microhaplotype genetic markers,we constructed a next-generation sequencing detection system.In the consistency study of the results based on the SNa Pshot composite detection system and PHASE software,the results based on the next-generation sequencing detection system show that,one of the 21 microhaplotype genetic markers,MH36(rs147756206-rs537463798)have more than 50% of samples were not effectively amplified and could not obtain complete genotype results.The sequencing results of other loci and other samples were completely consistent with the results based on the multiplex SNa Pshot detection system.All the sequencing original BAM files were viewed by IGV software,and the results showed that the haplotype results of the 21 MH markers in all samples were completely consistent with the phasing results inferred by PHASE.These results indicated that the method of detecting SNPs with SNa Pshot technology combined with phasing use PHASE software can be applied to microhaplotype detection.Among the 21 microhaplotype markers,two markers of MH10 and MH24 were found have four haplotypes,while other markers were all found have three haplotypes.Only two haplotypes were found in some populations at three loci of MH06,MH24,MH42.Four haplotypes of AA,AG,GA and GG were found in MH10 only in Hainan Li population,and only one sample was found to be AA.As for the locus of MH24,four haplotypes of CA,CG,TA,TG were found in Xichang Yis,Zunyi Gelaos,Hainan Hans and Ordos Mongolians,three haplotypes of CG,TA,TG in Chengdu Hans,Muli Tibetans,Wuzhong Huis and Kumul Uyghurs,three haplotypes of CA,CG,TA in Dujiangyan Tibetans,while two haplotypes of CG and TA were found in Hainan Lis and Tibetan Sherpas.The CA haplotype at MH24 is found only in East Asian populations,except in two populations(LWK and YRI)in Africa.In the 37 intercontinental populations,among the 21 SNP-SNP markers,there are nine loci(MH06,MH07,MH10,MH14,MH19,MH22,MH23,MH24,MH36)have been found have four haplotypes in the part of the populations,of which one locus of MH07 only had two haplotypes in some populations,while the other markers are all have three haplotypes.The results of forensic parameters show that the 21 microhaplotype genetic markers also have a high application potential in the identification of individuals and kinship.The results of In,genetic distances,principal component analysis,multi-dimensional scale analysis,phylogenetic analysis and STRUCTURE analysis were all suggested that the 21 genetic markers have good ancestry inference efficiency both in Chinese and intercontinental populations.This 21 MH loci can effectively distinguish Hainan Li,Kumul Uyghur and Tibetan Sherpa populations from other groups in the 11 studied populations.The other groups are divided into two clusters,in which the Tibetan-Yi corridor populations(Dujiangyan Tibetan,Muli Tibetan and Xichang Yi)are grouped into one clusters,and the Hainan Han,Chengdu Han,Zunyi Gelao,Wuzhong Hui and Ordos Mongolian populations are grouped into another clusters.In intercontinental populations,the system effectively distinguishes African and East Asian populations from other populations.Among the other populations,the American,European and South Asian populations were respectively grouped into three subpopulations.The Snipper results indicated that all the 33 test samples except three Uyghur samples were accurately predicted to come from the East Asian population at the intercontinental population level.At the level of local population in China,all samples from Hainan Li,Dujiangyan Tibetan,Xichang Yi,Zunyi Gelao,Tibetan Sherpa,Wuzhong Hui and Kumul Uyghur were accurately predicted to the corresponding population sources.Five samples from Hainan Han,Chengdu Han,Muli Tibetan and Ordos Mongolian were not completely accurately predict their true source populations,but all samples except one Mongolian sample were accurately predicted to the corresponding geographical location.At the level of more detailed population in China,all the samples from Hainan Li,Xichang Yi,Zunyi Gelao,Tibentan Sherpa,Wuzhong Hui and Kumul Uyghur were accurately predicted to their corresponding population sources.In Hainan Han,Chengdu Han,Dujiangyan Tibetan,Muli Tibetan and Ordos Mongolian,eight samples were wrongly predicted to other ethnic groups in the neighboring areas or the same ethnic groups with different geographical locations,but the first three major source groups in the predicted results contain their real population sources.The results shown that the 21 microhaplotype genetic markers have good application potential in the forensic ancestry inference of Chinese populations.It can be useful to provide relevant reference information for the investigation in the forensic practical.Conclusion: This study finally screened out 21 microhaplotype genetic markers specifically for Chinese population ancestry inference.Two SNa Pshot composite detection systems and a next-generation sequencing detection system were successfully constructed,which provide new methods for the research of forensic ancestry inference.A detection method combining SNa Pshot technology and PHASE software was established to determined SNP-SNP microhaplotype genetic markers,and this method was confirmed by next-generation sequencing technology to be accurate and reliable.Which provides a new technique for the detection of microhaplotype genetic markers.The 21 microhaplotype genetic markers showed good ancestry inference resolution both in Chinese and intercontinental populations.And high application potential in the identification of individuals and kinship were also found in the 21 markers.The fine genetic structure of Chinese and East Asian populations were explored using the 21 microhaplotype genetic markers,and this study provide ancestry information reference data of Chinese populations. |