| Background and ObjectivesThalassemia is a form of inherited autosomal recessive blood disorder which can cause serious threat to human health such as disability and death. About 2 percent of people worldwide are carriers of thalassemia gene, south China has a high incidence of Thalassemia with a carrier rate of 3%-24% of the population. Thalassemia is a hereditary hemolytic hemoglobinopathies characterized by defection of hemoglobin gene, resulting in reduced synthesis or deletion of the hemoglobin peptide. According to the type of inhibited hemoglobin gene, Thalassemia can be divided into α-thalassemia, β-thalassemia,δ-thalassemia, γ-thalassemia,δβ-thalassemia and εγδβ-thalassemia etc. There are three types of thalassemia based on the clinical symptoms, thalassemia minor (thalassemia carriers), thalassemia intermediate and thalassemia major. Special thalassemia face appear clinically -- expanded head, prominent cheekbones, eye distance widened, flat nose. clinical symptoms such as microcytosis and hypochromia, hemolytic anemia with jaundice, fatigued without strength, hepatosplenomegaly, irregular or regular blood transfusions, fetal edema or excessive iron deposition lead to heart failure death are depend on the severity of illness.Normally, the majority of adult hemoglobin mainly consists of two a-globin chains and two β-globin chains to form tetramer and transport oxygen with help of heme. Thus thalassemia are roughly classified into a-thalassemia and β-thalassemia. a-thalassemia is mainly caused by a-globin gene deletion, SEA,3.7 and 4.2 are the most common deletion types in Chinese population, In Guangxi region the carrying rate was 7.84%,4.78% and 1.61% respectively. β-thalassemia is mainly caused by β-globin gene mutation, most prevalent mutation in Chinese population were CD41-42(45.2% in Guangxi region), CD 17(24.7% in Guangxi region),-28, CD26, IVS-II-654 and CD71-72. These six mutations accounts for more than 90% of all type of mutations, a-globin gene cluster located on chromosome band 16p13.3, and adult a-globin chains were encoded by two closely linked genes al and a2, β-globin gene cluster located on chromosome band 11p15.5, and adult β-globin chains was encoded by β gene. At present, at least 300 kinds of a-globin gene variants and more than 200 kinds of β-globin gene mutation had been found in the world, while in Chinese population at least 30 kinds (17 kinds of deletion and 13 kinds of non-deletion) of a-thalassemia and 58 kinds (52 kinds of mutation and 6 deletion) of β-thalassemia. As thalassemia often inherited in an autosomal recessive manner, a child whose parents are both thalassemia carriers will have 25% risk to be affected. Due to lack of ideal treatment with thalassemia, blood transfusion and bone marrow transplantation are the ways to improve their life, but these two methods bring a financial burden to the family and cause serious impact to the psyche of patient. In order to prevent this disorder, population screening, genetic counseling and genetic testing should be taken to prevent birth defects in children. Up to date, routine molecular diagnosis for thalassemia is only capable of detection of 3 a deletions,3 a point mutations and 17β point mutations, which can not satisfy the detection of rare or novel variants. Therefore accurate diagnosis for thalassemia disease-causing mutations can be more effective prevention of children born with thalassemia, enrich known mutation database and provide new targets and research ideas for the treatment of patients with thalassemia.The second generation sequencing technology, also known as next generation sequencing technology or deep sequencing technology, which can sequence hundreds of thousands to one million DNA molecule sequence in one single panel. It leverages sequencing by synthesis technology, i.e. labeled dNTP by different colors of four different fluorescent, through base complementation pairing rule under the action of DNA polymerase, once adding one kind of dNTP and combined with the DNA nucleotide, corresponding fluorescence will be released and the computer will tell the exact base according to the fluorescence. Each single run can deliver data output ranging from several hundred megabytes up to several terabytes. Compared with Sanger sequencing, next generation sequencing can sequence hundreds of thousands samples simultaneously with advantages like high throughput, low cost and high resolution. Target capture sequencing is a method to enrich the regions of interest by hybridization, which can directly and efficiently obtain large quantities of variation information. Compared to broader approaches, such as whole genome sequencing, targeted sequencing is much faster and cost-effective for investigating areas correlation to human disease. So the new technology is widely used in detection of genetic disorders, oncology, drug development, etc. And this technology also can apply to unknown variants discovering of thalassemia and solve the undetectable variation of routine experiments.This study focuses on target capture technology combined with next generation sequencing to analyze thalassemia patients, by processing the raw data with BWA, Samtools, GATK, Annovar and a series of software combined with Python and Perl scripts to get point mutations, small insertions deletions, copy number variations, etc. This study aims to build a preliminary pipeline for thalassemia next generation sequencing data, establish a process of detailed analysis, and create related gene mutation spectrum. Our study provides a solid foundation for the post detection for thalassemia patients and also provides new research direction for the diagnosis and prevention of other genetic disease.Materials and Method1. Study subjectWe performed a retrospective research on 192 unrelated thalassemia samples from South China, which contains 93 β-thalassemia without a-globin variants,84 a-thalassemia without β-globin variants and 15 thalassemia co-inherited with a and β globin variants. This study was approved by the ethical committee of Nanfang Hospital, and informed consent obtained from participants.2. Hematological parameters tests and genotypingAll hematological parameters were measured by automatic cell counter and hemoglobin analyzer by high performance liquid chromatography. Genomic DNA was extracted from peripheral blood using the classic phenol chloroform extraction protocols. To genotype these samples, spanning the breakpoint PCR (Gap-PCR) was used for deletion detection and Reverse dot blot (RDB) was used to detect point mutation.3. Genome DNA library constructionWe used 100ng of each sample to do library construction if the genome DNA meet requirements with the quality control. CovarisS2 was used to broken DNA into 200bp-500bp fragments, after end-repair, phosphorylation of 5’prime ends, A-tailing of the 3’ends, ligation adapters and add barcode for each sample by some PCR cycles, hybridize the library with pre-designed Agilent liquid microarray and quantify library fragment size and concentration by Qubit from Life and 2100 from Agilent should be done. At last we performed cluster generation on cBOT and applied 90 cycles of sequencing on Hiseq2000 or Hiseq2500 platform.4. Next generation sequencing analysisRaw reads were split by different sample barcode tags, after quality control and adapter filtration, reads were mapped to NCBI human reference genome (hgl9, GRCh37) with BWA-backtrack and BWA-MEM two algorithms, with Samtools, GATK, Annovar and in-house Python scripts to detect SNP, INDEL, CNV variants and mutation annotation. A local deletion reference genome specific to 13 large deletions was constructed to realign the discordant pair reads and explore the breakpoint spanning reads, Combining this method with estimation of average depth in target region, we are finally able to increase the accuracy for detection of copy number variants in a globin gene cluster.ResultsThrough next generation sequencing of targeting the globin gene cluster, the average sequencing depth is above 250X, coverage of capture region is above 92% and the average depth of sequence more than 30X accounts for 93%. In 99 a-thalassemia samples, we detected 98 cases of deletion including 9 cases of 3.7/N,2 cases of 4.2/N,29 cases of SEA/3.7,14 cases of SEA/4.2,36 cases of SEA/CS,6 cases of SEA/N and 2 cases of SEA/QS deletion, besides we also detected 1 cases of mutation samples. In 1080-thalassemia samples, we found 108 cases of mutation samples including 20 homozygous,15 heterozygous and 66 compound heterozygous, 7 cases of deletion samples also been detected. All the sequencing variants were consistent with the laboratory genotyping results, i.e. the deletion were consistent with the Gap-PCR results and the point mutations were consistent with the RDB results, besides CD54-58 and IVS-II-5 these two mutations also had been detected. these results showed that next generation sequencing can rapidly and accurately detect mutation with large quantities of samples.Besides the above findings, there is a difference deletion position in the most prevalent SEA deletion of α-thalassemia between HbVar database and detection results. the HbVar database reported it’s deletion region is chr16:215400-234700 (hg19, HbVar:1086), while the high throughput data show it’s deletion region is chr16:215396-234699 (hg19). In β-thalassemia of Chinese deletion type the detected deletion region is chrll:5191124-5237502 (hg19) and chr11:5237540-5270051 (hg19) the two parts, which is inconsistent with HbVar database early reported chrll:5191124-5270051 (hg19, HbVar:1046). Using Sanger sequencing to verify SEA and Chinese deletion, sequencing peak demonstrated the correctness of next generation results. There are more than 300 kinds of α-globin gene variants and more than 200 kinds of β-globin gene mutation in thalassemia, as the restriction of sample size and mutation types in our analysis, we had a failure to include all types of mutations. With limited sample sizes and known genotype, we preliminary establish analysis pipeline of NGS data in thalassemia which can accurately identify the various types, update deletion breakpoints and detect new variants.ConclusionThis study applied target next generation sequencing on 192 cases of thalassemia to accurately detect variants on their hemoglobin genes and established a set of stable, high-throughput and automated analysis process. Combined with a large number of known database, the pipeline can detect both novel and rare variants and annotated them into clinician identify form, it will better explaining the association between disease variants and pathogenesis. The pipeline also fixed the breakpoint of SEA and Chinese deletion type. This study can provide reliable genetic data and scientific basis for the diagnose and treatment, prenatal diagnosis and genetic counseling of thalassemia patients, and reduce the birth defects in thalassemia effectively.Thalassemia as a hereditary hemoglobinopathies, it has a high carrier rate in South China. Recently, National Science and Technology specifically sets "973" research project for the prevention and treatment of birth defects, thalassemia was one of the key researches. National Health and Family Planning Commission of PRC also launched a series of projects such as birth defects prevention and pre-pregnancy inspection, through these actions thalassemia was concerned increasing by the society and people. So the population screening, genetic detection, explore the pathogenesis and treatment methods of thalassemia was particularly important, especially in accurately and rapidly detection of disease-causing gene.Next generation sequencing has been one of the hottest emerging technology in recent years, which rely on low initial sample, without control, measure large quantities of samples at once and results rapidly and accurately. Gene sequencing technology not only can greatly reduce the incidence of genetic-related diseases, reduce birth defects, but also can be implemented for disease prediction, prevention, early warning and individualized treatment. Most genes associated with human disease, genetic abnormalities, genetic damage will cause functional changes corresponding protein or enzyme, causing disease. There are about 2500 kinds of diseases have a corresponding gene detection methods, and apply in clinical of United States legally, even genetic testing has become one of the routine means for disease prevention. In the domestic, non-invasive prenatal screening for fetal Down syndrome risk making gene sequencing to create a new prospect in clinical application. The successful application of next generation sequencing on thalassemia patients shows this technology is also applicable to other monogenic disease and to provide a new research techniques and analysis methods for rare genetic disease. |