Font Size: a A A

Microhaplotype Reconstruction And Analysis Pipeline Development Of Target Sequencing Data

Posted on:2022-09-13Degree:MasterType:Thesis
Country:ChinaCandidate:K ChenFull Text:PDF
GTID:2480306497969129Subject:Biochemistry and Molecular Biology
Abstract/Summary:PDF Full Text Request
Short tandem repeats(STR)are widely used in forensic practices such as individual identification,paternity testing,and mixed sample analysis,for the highly polymorphism.However,this technique has obvious limitations when applied to mixture analysis,especially to unbalanced mixed samples: relatively high mutation rate,stutter peaks in PCR amplification and so on.Compared with STR,single nucleotide polymorphism(SNP)has the advantages of wider distribution,lower mutation rate,shorter amplification fragment and less stutter products.It is suitable for highthroughput sequencing.However,the genetic dimorphism of SNP requires up to thousands of SNPs for individual identification and mixed sample analysis in forensic practice.Therefore,genetic markers with high polymorphism,and can be detected by next-generation sequencing method,should be developed in forensic medicine.The concept of microhaplotype refers to a linkage disequilibrium region that is less than 300 bases pair and has two or more SNPs.As a new genetic marker,microhaplotype has the advantages of high polymorphism of STR,low mutation rate,short amplified fragment and easy realization of highthroughput sequencing of SNP.Whereas,there are few reports on the screening of high polymorphic microhaplotype loci specific to the Chinese population.In addition,the whole genome sequencing and the whole exon sequencing are relatively expensive for this kind of large sample with few targets.The target sequencing strategy based on multiplex PCR is more economical and efficient.Previously,our lab proposed that the use of blunt primer can improve the efficiency of multiplex PCR target library building.This method can reduce the occurrence of primer dimer in PCR process effectively and make the target sequencing depth more uniform.Nevertheless,in terms of data analysis,there is no pipeline developed in microhaplotype analysis for multiplex PCR targeted sequencing data.Therefore,it is very important to construct an automated pipeline for microhaplotype identification and screen microhaplotype loci with high polymorphism in Chinese population.First,we constructed and evaluated the microhaplotype target sequencing pipeline.After the evaluation of the running time,success rate and accuracy of 5 pair-end reads merge software,PANDAseq is chosen to merge pair-end reads for target sequencing,and the script in house was written with the script of Snake Make.Meantime,alternative selection of assemblying free pipeline is also provided.At last,we also manually reviewed the analysis results of the sequencing data of 3 Han samples in South China with IGV.The results show that the microhaplotype target sequencing procedure is consistent with the results shown in IGV.Secondly,on the basis of pipeline,we verified the polymorphism and forensic performance of 20 microhaplotypes in the Han population in South China.20 microhaplotypes loci with high polymorphism were selected from ALFRED database in Han population in Beijing China and Han population in South China.Then,the Pipeline was used to analyze the multiplex PCR target sequence data of 96 unrelated Han population in South China,and the hybridization degree,minimum allele frequency,effective number of alleles,individual recognition rate and individual recognition rate of these microhaplotypes loci were calculated.The last but not the least,the 1000 Genome Project was used as a reference data set to screen out microhaplotype loci that meet the requirements of targeted sequencing and have high polymorphism in the Chinese population according to the set standards.We used the PHASE software to predict the alleles of microhaplotype loci in the Chinese Han population in Beijing China and the Chinese Han population in South China based on the 1000 Genome Project database.Then,we selected 44 microhaplotypes loci from the initial selected loci and used the data set in the Simmons Genome Diversity Project as the verification data to conduct a preliminary study of the polymorphism and forensic performance of the 44 sites in the population.To sum up,a simple and convenient microhaplotype target sequencing analysis pipeline was established,as well as its accuracy was verified by using multiple PCR targeted sequencing data from Han population in South China,based on the screening of target sequencing related bioinformatics software.In addition,the high polymorphic microhaplotype loci in Chinese population were screened by using the data set from the 1000 Genome Project database.The Simmons Genome Diversity Project database was used to demonstrate the high polymorphism of the selected loci in East Asian population are useful for forensic practice.Our study can provide more options and schemes for multiple PCR targeted microhaplotype identification and screening of high polymorphic microhaplotype loci in Chinese population.
Keywords/Search Tags:Bioinformatics, Microhaplotype, Multiplex PCR, Targeted sequencing, Highly polymorphic loci
PDF Full Text Request
Related items