Font Size: a A A

Establishing The Pipeline For The SNP Position In The Target Region And Comparative Analysis Based On The Next Generation Sequencing Technology

Posted on:2017-09-13Degree:MasterType:Thesis
Country:ChinaCandidate:H XiaFull Text:PDF
GTID:2334330503478347Subject:Bio-engineering
Abstract/Summary:PDF Full Text Request
With the development of the sequence technology, high throughput sequencing technology has been widely used in individualized diagnosis and treatment, especially in clinical diagnosis. However, there is still one key issue arising in this area. The pipelines of bioinformatics which suit for the analysis of the sequencing data, are unavailable. During my internship in the company, a novel sequencing method for genotyping based on next generation sequencing are developed. Because there is not a suitable pipeline for the data produced by the new method, we try to provide a program. Hence, in this study, we have built a pipeline to genotype the SNP, which contains steps of quality control, adapters and sequencing primers trimming, barcodes splitting, alignment, SNP calling and genotyping. Finally, the genotyping data of every samples could be called in a fairly short time after the raw sequencing data have been retrieved into the pipeline.In this pipeline, Cutadapt is used to remove the primers and adapters from the reads and filter the reads whose average sequencing quality value is lower than 20. About 75% reads will be left after quality control and 64% reads will be screened to each sample by using Fastx or BSFI. In this pipeline, adaptor which is linked between the DNA sequence and barcode will be used to remove the barcode and adaptor from the read. After removing the barcodes and adaptors, the sequence of each sample has been classified completely. BWA is used for the reads to align to the reference sequence, and Samtools is used to search the SNP variant from the alignment(SNPcalling) in the pipeline. More than 90% classified reads could be aligned to the reference sequence. After SNPcalling, we will script the code to determine the genotype of each sample in the corresponding position of the reference sequence and the genotyping results will be output in the form of txt.A novel program named BSFI were built in this study to replace the Fastx, owing to the time in barcode splitting is too much. Compared with Fastx, BSFI can distinguish the one base deletion and the splitting speed is much faster due to the multi threads. The result shows the optimization can improve the splitting speed by six times while the accuracy is the same as Fastx, which makes the SNP genotyping can be finished less than one day.From inputting the sequencing data to obtaining the genotyping results, it takes less than six hours by using the pipeline, which can satisfy the demand for the new SNP genotyping technology in individualized diagnosis and treatment.
Keywords/Search Tags:clinical diagnosis, Ion data, BSFI, Alignment, Genotyping
PDF Full Text Request
Related items