Establishing The Pipeline For The SNP Position In The Target Region And Comparative Analysis Based On The Next Generation Sequencing Technology

Posted on:2017-09-13

Degree:Master

Type:Thesis

Country:China

Candidate:H Xia

Full Text:PDF

GTID:2334330503478347

Subject:Bio-engineering

Abstract/Summary:

PDF Full Text Request

With the development of the sequence technology, high throughput sequencing technology has been widely used in individualized diagnosis and treatment, especially in clinical diagnosis. However, there is still one key issue arising in this area. The pipelines of bioinformatics which suit for the analysis of the sequencing data, are unavailable. During my internship in the company, a novel sequencing method for genotyping based on next generation sequencing are developed. Because there is not a suitable pipeline for the data produced by the new method, we try to provide a program. Hence, in this study, we have built a pipeline to genotype the SNP, which contains steps of quality control, adapters and sequencing primers trimming, barcodes splitting, alignment, SNP calling and genotyping. Finally, the genotyping data of every samples could be called in a fairly short time after the raw sequencing data have been retrieved into the pipeline.In this pipeline, Cutadapt is used to remove the primers and adapters from the reads and filter the reads whose average sequencing quality value is lower than 20. About 75% reads will be left after quality control and 64% reads will be screened to each sample by using Fastx or BSFI. In this pipeline, adaptor which is linked between the DNA sequence and barcode will be used to remove the barcode and adaptor from the read. After removing the barcodes and adaptors, the sequence of each sample has been classified completely. BWA is used for the reads to align to the reference sequence, and Samtools is used to search the SNP variant from the alignment(SNPcalling) in the pipeline. More than 90% classified reads could be aligned to the reference sequence. After SNPcalling, we will script the code to determine the genotype of each sample in the corresponding position of the reference sequence and the genotyping results will be output in the form of txt.A novel program named BSFI were built in this study to replace the Fastx, owing to the time in barcode splitting is too much. Compared with Fastx, BSFI can distinguish the one base deletion and the splitting speed is much faster due to the multi threads. The result shows the optimization can improve the splitting speed by six times while the accuracy is the same as Fastx, which makes the SNP genotyping can be finished less than one day.From inputting the sequencing data to obtaining the genotyping results, it takes less than six hours by using the pipeline, which can satisfy the demand for the new SNP genotyping technology in individualized diagnosis and treatment.

Keywords/Search Tags:

clinical diagnosis, Ion data, BSFI, Alignment, Genotyping

PDF Full Text Request

Related items

1	Research On Genotyping Method Of Third-Generation Sequencing Data Based On Dynamic Programming
2	A Blockchain-based Alignment System For Cross-institutional Patient Data
3	Analysis of induced gamma oscillations with a data alignment technique in autism and attention deficit hyperactive disorder
4	Research On Entity Alignment In The Field Of Genetic Diseares
5	Hypertension Polymorphic Position Mining Detection Based On Dynamic Short Sequence Alignment Algorithm
6	Development Of An ABO Genotyping Method And Its Preliminary Application Of Prenatal Diagnosis Of Hemolytic Disease Of Newborn
7	The Correlation Of The Cervical Alignment With The Thoracolumbopelvic Alignment In Asymptomatic Population Along With Aging
8	Bioinformatics framework for genotyping microarray data analysis
9	Estabilishment And Application Of Novel SNP Genotyping Method
10	Modeling And Analysis Of Auxiliary Diagnosis Of Intestinal Diseases In Children Based On Clinical Data