| The principles of tumorigenesis and progression have been the focus of cancer-associated studies. Detection of tumor genomic copy number alterations is the basis of discovering cancer-associated genes, and becomes the primary task in many cancer studies. With the development of high-through DNA sequencing technologies, the experimental tools of cancer genomics have gradually changed from the traditional array-based technologies such as aCGH and SNP-array to next-generation sequencing technology.Because of the huge amount of data, efficient analysis of the next-generation sequencing data has become a difficult area in the related field. In addition, tumor samples often exist complex issues such as normal cell contamination, genomic aneuploidy and tumor heterogeneity. These issues will have a significant interference on the sequencing data, and therefore seriously influence the accuracy of the detection of copy number alterations. Therefore, the detection algorithms of cancer genomic copy number alterations need to effectively address the afore-mentioned key issues.In this study, based on the analysis and summarization of the tumor next-generation sequencing data, several algorithms and tools for detection of genomic copy number alterations in different application backgrounds are designed and developed. The main contents and results are summarized below:1. An algorithm named CLImAT is proposed to detect copy number alterations and loss of heterozygosity from non-paired tumor whole-genome sequencing data, which can automatically correct the effects of normal cell contamination and tumor aneuploidy on the whole-genome sequencing data. Firstly, effective signal correction and normalization procedures are used by the algorithm, including GC and mapppability bias correction in read depth signals by a nonparametric method, and allele frequency deviation correction using a quantile normalization method. Secondly, the algorithm introduces a novel hidden Markov model to jointly analyze the read depth and allele frequency, and adopts parametric modeling of normal cell contamination and tumor ploidy, therefore it reliably detects the tumor genomic copy number alterations and loss of heterozygosity. Finally, through performance evaluation on multiple datasets, CLImAT demonstrates significant advantages when dealing with the whole-genome sequencing data of complex tumor samples.2. To detect genomic copy number alterations and loss of heterozygosity of different clonal populations from whole-genome sequencing data of heterogeneous tumor, an algorithm named CLImAT-HET is proposed. The algorithm takes account of the effect of tumor heterogeneity on whole-genome sequencing data, and adopts the factorial hidden Markov model to analyze the data. The advantages of CLImAT-HET are as follows:1) The algorithm performs reasonable decomposition of the mixed signals generated by multiple clone populations, and significantly improves the performance of detecting copy number alterations and loss of heterozygosity; 2) The algorithm is more sensitive to genomic aberrations occuring in the clonal populations with smaller cellularity; 3) It can estimate the cellularity of each clonal population.3. CloneCNA, an algorithm for detecting copy number alterations using paired whole-exome sequencing data of tumor and normal samples, is proposed. The algorithm takes effective data preprocessing methods to mitigate the effects of normal cell contamination, tumor aneuploidy and tumor heterogeneity on whole-exome sequencing data. CloneCNA also empolys the factorial hidden Markov model to analyze tumor clonal populations as well as their genomic copy number alterations and loss of heterozygosity, and parametrically model normal cell contamination, tumor ploidy and heterogeneity, therefore it reliably detects copy number alterations of different clonal populations. In addition, the algorithm uses the Bayesian information criterion to evaluate the model complexity under different number of tumor clonal populations and select the optimal number. Performance evaluation on multiple datasets shows that CloneCNA has superior performance of detecting copy number alterations.4. An online bioinformatics tool named DeAnnCNV is designed to detect and annotate copy number alterations from whole-exome sequencing data. The tool can simultaneously process whole-exome sequencing data from multiple samples to accurately detect copy number alterations and provide detailed visualization results. In addition, the tool integrates the existing bioinformatics database resources, and can annotate copy number alterations represented in multiple samples in different aspects and provide useful functional information. |