Font Size: a A A

Research Of Library Preparation And Target Capture Methods For High-throughput Sequencing And Their Applications In Molecular Genetics Of Complex Diseases

Posted on:2021-04-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:J ZhouFull Text:PDF
GTID:1480306506450284Subject:Biology
Abstract/Summary:PDF Full Text Request
In recent years,because of the rapid development of high-throughput sequencing technology,a large amount of genome and transcriptome sequencing data has been accumulated,which greatly promotes the development of molecular genetic studies of diseases.High-throughput sequencing technology promoted the discovery of new genes for Mendelian diseases,and subsequently discovered that substantial proportions of neurodevelopmental disorders are attributed to de novo mutations in the coding region of the genes.At the same time,as the comprehensive genetic landscapes of many types of cancers,such as breast,colorectal and acute myeloid leukemia presented,high-throughput sequencing reveals the extraordinary genetic heterogeneity and complex molecular network of cancer,as well as effectively defines a molecular taxonomy.With the throughput increase and cost reduction,high-throughput sequencing technology has been applied in researches and clinical practice,especially in the identification of candidate genes and causal variants for complex diseases,non-invasive prenatal testing of fetal chromosomal aneuploidy,Cancer diagnosis,monitoring and precision medicine.However,it is a remarkable fact that there are still a series of limitations of high-throughput sequencing technology in practical applications.For example,firstly,the low conversion efficiency of sample to library results in not enough sensitivity and specificity for the detection of trace rare variants;secondly,the complicated sample processing procedures are very time consuming and exhausting enormous resources;thirdly,at present,the error rate of high-throughput sequencing platforms still reaches 1%?5%,resulting in difficulty for distinguishing sequencing error and rare mutations with the frequency of less than 5%specifically.Therefore,it is necessary to develop high-throughput sequencing methods with higher throughput,easier operations,higher sensitivity and specificity.To overcome these limitations of high-throughput sequencing technology,we research and improve the two most commonly used library preparation methods for target sequencing,amplicon-based and liquid-phase hybridization-based target capture technologies.Our study optimizes the library preparation workflow of Ampli Seq technology and duplex sequencing method with molecular barcode for ultra-low frequency mutation detection,and applies the improved protocol in the analysis of schizophrenia susceptibility gene and circulating tumor DNA analysis of biliary tract cancers.Schizophrenia is one of the representative complex diseases.Although its heritability is as high as 80%and a large number of genome-wide significant susceptibility genes or regions for schizophrenia have been reported,few functional causal variants have been located by high-throughput sequencing because of the complicated sample processing procedure which is not suitable for researches with large sample size and the cost of whole-genome sequencing is still unacceptable.In the section part of this dissertation,we developed and optimized the library preparation method,so the targeted genes and regions can be quickly and easily enriched to complete the location analysis of the functional variants of schizophrenia candidate genes.The improved method can complete a two-step PCR reaction by adding PCR enzyme only once and without any AMPure XP beads purification steps except the final library purification,making the process of library preparation much easier and more economical as well as greatly improving the throughput of library preparation.So that the workflow we optimized is more convenient and feasible to be applied in the researches with large sample size.Then,with the optimized workflow,we performed targeted sequencing for all exons and un-translated regions of EMB gene and BNIP3L gene among 1806 patients with schizophrenia and 998 healthy controls of Han Chinese origin.A total of 58 high-quality variants of EMB gene and114 variants of BNIP3L gene were identified in case and control groups.Seven of them in EMB gene and three of them in BNIP3L gene are nonsynonymous rare variants.EMB:p.Ala52Thr,p.Glu66Gly,p.Ser93Cys,p.Ala118Val,p.Ile131Met,p.Gly163Arg and p.Arg238Tyr,BNIP3L(NP?004322):p.Asn18Asp,p.Gly56Glu and p.Met105Leu,but none of them reached statistical significance.Nonsynonymous rare variants founded in BNIP3L gene were only detected in the schizophrenia cases,and the number of these variants carriers between schizophrenia cases and healthy controls is significantly different(P=0.035).In addition,common variants rs3933097(Pallele=3.82×10-6,Pgenotype=3.18×10-5)located in 3'-UTR of EMB gene and rs147389989(Pallele=0.007,Pgenotype=0.017)in BNIP3L gene achieved allelic and genotypic significance with schizophrenia.Rs1042992 and rs17310286 in BNIP3L gene were significantly associated with schizophrenia in meta-analyses using PGC,CLOZUK,and our new datasets,which further verified the results of previous genome-wide association studies.On the one hand,our findings provided further evidence that EMB gene and BNIP3L gene are susceptibility genes of schizophrenia.On the other hand,we firstly revealed functional and potential causal mutations in the two genes by high-throughput targeted capture sequencing,establishing an important foundation for further functional experiments to reveal their specific mechanism in the etiology of schizophrenia.In addition to psychiatric diseases,cancer is another type of important and representative complex disease.The development of high-throughput sequencing technology and the reduction of costs have opened new avenues for early cancer screening,monitoring,and precision medicine.Especially,the non-invasive liquid biopsy by the detection of circulating tumor DNA in plasma has attracted more and more attention in recent years.At present,several relevant testing kits have been approved for clinical practice in lung cancer.However,circulating tumor DNA detection by high-throughput sequencing in biliary tract cancers,which has been no breakthrough,is much more difficult than that in lung cancer.This is mainly due to the low conversion efficiency of sample to library,low sensitivity and specificity of high-throughput sequencing technology.To solve this problem,in the third part of this dissertation,we optimized the method of duplex sequencing with the molecular barcode correction for the ultra-low frequency circulating tumor DNA analysis.We change the synthesis method for the adaptors with molecular barcode and the ligation system in the process of library preparation for next generation sequencing.As the result,the conversion efficiency of sample to library is increased from less than 50%to more than 95%,and coupled with the correction of duplex molecular barcode,greatly improving the sensitivity and specificity of high-throughput sequencing technology.With 30 ng sample input,the sensitivity of 0.5%frequency mutation detection is improved to100%,and the false positive rate is only 0.001%.Then,we collected blood and tumor tissues of 51 patients with biliary tract cancers for whole-exome sequencing,and collected their plasma before and three days after surgery,and capture the exons of biliary tract cancers related genes for duplex sequencing with molecular barcode.Circulating tumor DNA is detected in plasma collected before surgery in more than 60%of patients.In total,about 50%of somatic mutations detected in tumor tissues are also detected in plasma collected before surgery.The concordance rate of the somatic mutations founded in tumor tissues and plasma is affected by the tumor position and stage.Generally,the concordance is relatively high in intrahepatic cholangiocarcinoma,gallbladder cancer and advanced cancer patients.The concentration of cell free DNA in plasma collected three days after surgery increased stressfully,and the frequency of most circulating tumor DNA mutations are significantly reduced or cleared.The detection of circulating tumor DNA,especially TP53 gene mutations,in the plasma collected after surgery is related to poor prognosis(detection of circulating tumor DNA:P=0.0395,HR=6.315;detection of TP53 gene mutations:P=0.0101,HR=25.79).In this research,we have obtained a large amount of data from next generation sequencing.The above scientific discoveries are all firstly reported,and allow our clinical collaborators to carry out the research on translational application in the clinic.The author's doctoral work has made the above important progress,and has further obtained a large amount of sequencing data.Further mining of these data,collecting and testing more samples with complete clinical data,and further verifying the current conclusions will continue according to the doctoral dissertation research.Through optimization and application for the high-throughput sequencing technology,we verify the genetic pathogenic genes of two complex diseases,schizophrenia and biliary tract cancers;propose a new,accurate,non-invasive and feasible method for biliary tract cancers diagnosis;and discover a new biomarker for prognostic evaluation of biliary tract cancers.
Keywords/Search Tags:Next generation sequencing, Circulating tumor DNA, Molecular barcode, Biliary tract cancers, Schizophrenia, Association study
PDF Full Text Request
Related items