Font Size: a A A

SNP Calling And Its Preliminary Application Based On Next Generation Sequencing Data

Posted on:2014-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y H GaoFull Text:PDF
GTID:2250330401970944Subject:Genetics
Abstract/Summary:PDF Full Text Request
Next generation sequencing(NGS) is widely used recently in the fields of biomedicine, genomics, transcriptomics, systems biology and other disciplines, quickly promoted such as locating of disease gene, crop genetics and breeding, Epigenetic research from a single gene to the genome-wide research extent. Now, NGS provide a fast, low-cost massively parallel DNA sequencing methods as well as a great deal of challenge for downstream data analysis comes from the huge data produced by NGS. From the basic principle of NGS, combined with the focal point and difficulties in polymorphism processing and analysis of NGS data, this thesis make use of a set of practical processes to successfully applied in the yeast genome and get a reasonable result. Then make a systematic evaluation of the quality of the Ion Torrent sequencing and its initial improvement and some discussions.Analyzing the re-sequenced Ion Torrent data from a known genome of E.coli, we find there is a significant tendency that the error probability increases in pace with the increases of the identical polynucleotide length through statistical. And there is a case that sequenced bases and reference bases are swapped each other. After removing the errors of Homozygous length greater than2and the swap-type mismatch, the error rate for Insertion, Deletion, Mismatch drops to0.13%,0.12%,0.05%respectively, about half of its raw error rate. The proportion of free error reads rises from48.30%to67.90%. The removal of bases in sequencing errors’ ratio accounted for only1.13%indicates that the removal of a small number of bases in high sequencing error rate can significantly improve its overall sequencing accuracy. Therefore, phase errors when Ion Torrent reading information together with the probability of sequencing errors at the edge of Homozygous is higher than other positions and quality values of mismatches after filter is obviously lower than the values of free errors’ and other analysis results or trends can contribute to the purpose of improving its quality of sequencing further.After sequence alignment, base calibration, re-alignment,genotyping and SNP calling and filter (optional) steps, the NGS454Pyrosequencing data from22kinds of Yeast (average depth:18x) will produce397382loci of SNP, including276925Ti and128467Tv accompanied with the Ti/Tv ratio of2.156and such proportion is consistent with the one of whole-genome SNP type, whose frequency in Yeast genome is about30bp/SNP. Analysis can be obtained after LOF that346647(87.23%) SNPs sites are in the exon regions,42,911sites in the intron regions,200sites in the UTR region and119537non-synonymous SNP sites from the exon located SNPs.204,407(51.44%) SNPs of all are single species-specific. Comprehensive information from the analysis, we believe that a different genetic background strains associated with its functional classification, and their SNP information can provide material for the research of evolution of species, differential expression, genetic and other researches.
Keywords/Search Tags:Next generation sequencing, SNP, Analysis pipe
PDF Full Text Request
Related items