Font Size: a A A

Research On Bias Correction And Analysis Methods For DNA Methylation Array Data

Posted on:2022-05-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z X WangFull Text:PDF
GTID:1480306569982989Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of high-throughput sequencing technology and the ongoing large-scale genome projects,the scale of epigenome data represented by DNA methylation array data has also become larger.How to effectively analyze and apply DNA methylation array data has become a hot issue in bioinformatics,which is of great significance for understanding the relationship between epigenetics and complex diseases.However,the different probe designs applied in the DNA methylation array have brought about data bias,which will affect the accuracy of methylation array data analysis and restrict the application of methylation array data.At the same time,existing methylation array data preprocessing methods have the problem of adaptability,so there is an urgent need to develop and establish new bias correction methods to lay a good foundation for the accurate analysis of methylation array data.The dissertation focuses on bias correction for DNA methylation array data and analysis methods on cancer subtype identification and tumor purity estimation based on the data.The main content is divided into the following aspects:Firstly,aiming at the problem that existing bias correction algorithms for DNA methylation array data lack a comprehensive comparison and analysis,a systematic evaluation method is proposed.By integrating multiple different DNA methylation data analysis processes,we construct a DNA methylation array data bias correction evaluation standard,design evaluation metrics such as technical variation reduction,the probe design bias correction,differential methylation sites and regions identification,and compare and analyze existing bias correction methods,so that researchers can be guided to choose appropriate methods and modules according to their different data analysis tasks.Secondly,aiming at the problem of poor adaptability of the existing bias correction methods for DNA methylation array data,a method based on the Gaussian mixture model to correct the bias in DNA methylation array data is proposed.By modeling the methylation M value with good statistical characteristics,the method does not require prior biological assumptions and can automatically identify different methylation state breakpoints during the distribution fitting process,thereby avoiding errors caused by manual settings.The method is superior to existing methods in terms of technical variation reduction,probe design bias correction,and biological differential methylation sites identification,and can effectively improve the quality of DNA methylation array data.At the same time,the method is also versatile and is suitable for bias correcton of different versions of DNA methylation array applying two probe designs.Thirdly,aiming at the problem of cancer typing,a method based on variational autoencoders for subtype identification of DNA methylation array data is proposed.The method uses the variational autoencoder to map the high-dimensional features of DNA methylation array data to the low-dimensional space,extracts its hidden space features for clustering,and avoids the shortcomings of K-means clustering algorithm by applying unsupervised hierarchical clustering,and improves the accuracy of the typing results.The method broadens the research ideas of traditional cancer subtype identification,and provides novel methodological support for identifying new cancer subtype-specific DNA methylation patterns.Fourthly,aiming at the problem of tumor sample purity estimation,a method for estimating purity of tumor samples based on DNA methylation array data is proposed.The method first extracts and fuses differentially methylated sites based on limma differential methylation analysis and Mann-Whitney U statistics,analyzes the biological significance of the extracted features,then uses kernel density to estimate the purity and improves the accuracy of purity estimation results for tumor samples.
Keywords/Search Tags:DNA methylation array, bias correction, Gaussian mixture model, cancer subtype identification, tumor purity estimation
PDF Full Text Request
Related items