Font Size: a A A

Methods On Modeling And Analysis Of Tumor Heterogeneity Based On SNP Arrays And NGS Data

Posted on:2016-05-24Degree:MasterType:Thesis
Country:ChinaCandidate:H XiaFull Text:PDF
GTID:2284330470957905Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
With the processing of profound studies on tumor, heterogeneity has been gradually recognized as a typical feature of individual tumor samples. Tumor heterogeneity means heterogeneous tumor subclone cells co-exit in a tumor tissue, which may present differences in proliferation, invasion, motility and treatment. Recently, the advent of high throughput, large scale sequencing technologies provide a great opportunity to reveal the essential nature of tumor at genomic level. Due to sample purity, complex aberrational pattern of tumor subclone genome and the signal noise of the sequencing technonogies, it is still a great challenge to effectively modeling the massive heterogeneous tumor sequencing data.In such circumstances, in this paper we proposed two statistical methods for analysis single nucleotide polymorphism (SNP) arrays data and next generation sequencing (NGS) data, respectively, with the purpose of identification of different tumor subclones and detection of tumor genomic copy number aberrations. The contents of this study are summarized as follows:1) Comprehensive analysis is conducted for investigating the change of two signals provided by SNP arrays technology:Log R ratio (LRR) and B allele frequency (BAF), with respect to different copy number aberration states. We described the signal bias caused by several non-ideal factors of real tumor samples, such as normal cell contamination, tumor aneuploidy and GC content. From the view of two-dimension genome print, we discussed how tumor heterogeneity affects its overall distribution.2) By virtue of matched normal sample and annotated SNP, we transfer the read depth of NGS to LRR and BAF through proper extraction and conversion. However, there are differences in data size, distribution, and noise between the LRR and BAF profiles of the two technologies.3) We proposed an approach, named CHASE, for analysis of the heterogeneous tumor SNP arrays data. Based on HMM, the relationship between observable genomic signals and the hidden genotypes can be established. In addition, CHASE incorporates non-ideal factors in to the parameterized statistical model. The novel of this approach is taking two tumor subclones into account, and adopting Newton-Raphson method to solve the proportion of each subclone cells. We tested the proposed approach on simulated datasets and two real breast tumor samples, and the results show that the proposed method can efficiently estimate tumor subclone proportions and identify genomic aberrations simultaneously.4) Based on an enhanced circular binary segmentation (CBS) algorithm, we proposed a statictical framework (SAPPH) for detecting tumor heterogeneity on NGS data. Through separately analysis high confidence genome intervals, clustering local tumor proportions and BIC model selection, SAPPH can effectively avoid BAF signal truncation, and greatly reduce computational complexity. We benchmarked the performance of SAPPH and showed that it outperforms existing method in detecting tumor heterogeneity and copy number aberrations.In conclusion, we believe this study provide efficient bioinformatics tools for heterogeneous tumor sample analysis for SNP arrays and NGS data, and they will greatly facilitate future research on cancer, including finding drive mutation, deducing evolutionary history of cancer genome and personalized treatment.
Keywords/Search Tags:Tumor heterogeneity, copy number aberration, SNP arrays, NGS, HMM, CBS
PDF Full Text Request
Related items