Font Size: a A A

Accurate Inference Of Tumor Purity And Absolute Copy Number From High-throughput Sequencing Data

Posted on:2021-12-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z LiFull Text:PDF
GTID:2504306050967279Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Discovery of copy number variations,a main category of structural variations,has greatly changed our understanding of individual differences.Copy number variations is mainly manifested in the deletions and duplications of DNA segments of size 50 bp and maybe as large as 1Mbp,which is the main cause of cancer diseases.Therefore,it plays an important role in human genome diversity and tumor research.The development of high-throughput sequencing technology provides potential advantages for the detection of copy number variation.However,most of these methods do not consider or ignore the effect of the mixture of tumor cells and normal cells,thus are not aiming at identifying the absolute copy number(the copy number contained in pure tumor cells)of tumor genome.In addition,the effect of the mixture will also affect the subsequent analysis of other genomes,like single nucleotide variation.Therefore,accurately estimation of tumor purity is a necessary step to solve this problem.Tumor purity means the fraction of all cancerous cells within a tumor sample.Because tumor samples are generally heterogeneous and a mixture of multiple cell groups.We assume that the mixture of normal cells and tumor cells can be similar to tumor tissue to a large extent,which means the tumor is a mixture of normal cells and tumor cells.The lower tumor purity will weaken the signal obtained by copy number mutation detection,which will have a bad impact on the subsequent detection results.In addition,tumor purity plays an important role in other gene mutations,methylation analysis and tumor heterogeneity.In this paper,we propose a new approach,AITAC,to accurately infer tumor purity and absolute copy numbers in a tumor sample by using high-throughput sequencing data.In contrast to many existing algorithms for estimating tumor purity,which usually rely on pre-detected mutation genotypes(heterogeneity and homogeneity),AITAC just requires read depths observed at the regions with copy number losses.AITAC creates a nonlinear model to correlate tumor purity,observed and expected read depths.It adopts an exhaustive search strategy to scan tumor purity in a wide range,and chooses the tumor purity that minimizes the deviation between observed read depths and expected ones as the optimal solution.Finally,based on the predicted tumor purity,it further infers the absolute copy number of the sample.Here,we used one of the popular simulators to generate synthetic datasets in different configurations(i.e.the purity level of each tumor)and compare our method with the other two classical methods through these simulation datasets,the effect is obviously better than the other two methods,which demonstrate the effectiveness of the method and the potential application ability in real datasets.In addition,we also apply this method to the real sequencing data from lung cancer patients,and then compare it with the classical tumor prediction algorithm.The result has a certain fitness and stability,which proves the reliability of this algorithm,and it will be expected to become a useful approach for researchers to analyze copy numbers in cancer genome.
Keywords/Search Tags:Tumor purity, absolute copy number, high-throughput sequencing data, read depth, nonlinear model
PDF Full Text Request
Related items