Font Size: a A A

Tumor Purity Estimation And Differential Methylation Analysis Based On DNA Methylation Data

Posted on:2020-04-10Degree:MasterType:Thesis
Country:ChinaCandidate:Z Z YanFull Text:PDF
GTID:2404330599459954Subject:Operational Research and Cybernetics
Abstract/Summary:PDF Full Text Request
In cancer genomics research,the clinically obtained tumor tissue is usually a mixture of cancer cells,normal cells,immune cells,etc.The tumor ”impurity” will bring serious deviations to subsequent genetic analysis and other test results.Leading to unsatisfactory research results and even erroneous findings.At the same time,existing purity estimation methods require corresponding normal samples for reference when estimating the purity of tumor samples.In The Cancer Genome Atlas(TCGA)of tumor types,most tumor types have no(or only a few)Normal sample.Aiming at the above two problems,based on the Illumina Infinium HumanMethylation 450 K BeadChip,two simple tumor purity estimation methods of GmmPurify and IPCA were constructed.The GmmPurify method first uses the Gaussian mixture model to define an important statistic “information contribution value”by means of the common normal sample;then,the DNA methylation site with high information contribution value is selected to form a differential methylation site set;Finally,the nuclear density method was used to estimate the purity of the tumor.The paper applies the GmmPurify method to each cancer type of TCGA,and the purity estimates obtained are highly consistent with the results of two advanced methods.The results of the study indicate that GmmPurify can give a satisfactory estimate of tumor purity with the help of common normal samples in the absence of normal samples matching the tumor samples.The IPCA method is an iterative algorithm that does not require any normal samples,uses only tumor samples,and uses principal component analysis and kernel density methods to screen out distinctly significant methylation sites to form a set of information sites through iterative algorithms.The locus set was used for kernel density estimation to obtain a purity estimate of 32 tumor types in TCGA.The results were highly consistent with the published six purity estimates.Further,based on the ”positive correlation between the tumor purity vector and the methylation level vector of the hypermethylation site”,all sites were divided into hypermethylation sites and submethylation sites using the tumor purity vector obtained by IPCA.Points,the classification results have higher accuracy and AUC values,and are better than the classification results of blood samples as normal samples.Finally,using the minfi differential methylation analysis results of normal samples as the gold standard,the IPCA obtained differential methylation site ranking is also superior to the minfi analysis using blood samples as normal samples.result.The above experiments show that IPCA can be used as an effective tool for estimating tumor purity in the absence of normal samples,and can be used as an alternative tool for super-and sub-methylation site classification and differential methylation analysis.
Keywords/Search Tags:DNA methylation, tumor purity, GmmPurify, IPCA, information contribution value, differential methylation site, iterative algorithm, differential methylation analysis
PDF Full Text Request
Related items