Font Size: a A A

Machine Learning Methods For Chromatin Accessibility Prediction By Integrating Multi-omics Data

Posted on:2022-05-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q LiuFull Text:PDF
GTID:1480306746456684Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of deep sequencing technology,the quick accumulation of multi-omics data,including genomic,transcriptomic,and epigenomic sequencing data,provides a rich resource to discover the cell types that constitute organisms,understand the mechanism of gene regulation in cells,and analyze the occurrence and development of genetic diseases.However,achieving a comprehensive interpretation of these biological big data still has several limitations such as insufficient accuracy in the and functional prediction underlying the biological data and insufficient analysis for the multi-source heterogeneity of the biological data.In recent years,artificial intelligence technology,especially deep learning,has made breakthrough advances in many fields,thus providing a powerful tool to solve the above-mentioned key problems.This thesis focuses on the computational analysis of chromatin accessibility data,which includes a comprehensive investigation of machine learning methods for predicting chromatin accessibility and exploring theories and methods of single-cell chromatin accessibility analysis through the integration of multi-omics data.The main research contents and innovation results include:First,for the problem of chromatin accessibility prediction,a random forest-based method kmer Forest that integrates genome sequence and evolutionary conservation was proposed.It enables chromatin accessibility binary prediction of the genome in a given cell line.A hybrid deep convolutional neural network named Deopen,which integrates the word frequency of short genomic fragments,was further proposed.It can make binary prediction and continuous regression of chromatin open signals.Large-scale crossvalidation experiments show that the prediction performances of the above methods are better than the existing methods and the prediction results can promote the analysis of genetic data.Second,for the problem of cross-cell-type prediction of chromatin accessibility,a densely connected convolutional network model Deep CAGE that fuses genome annotation and transcriptomic data was proposed.By utilizing the existing biological prior knowledge,the prediction accuracy of the model is largely improved.An analysis approach was established based on the chromatin accessibility for interpreting the genetic factors of the complex phenotype-related non-coding regions.This analysis approach was successfully applied to the study of complex phenotypes.Third,for the problem of cell type discovery in single-cell chromatin accessibility analysis,a cycled generative adversarial network model sc DEC was proposed.We first demonstrate the theoretical basis of sc DEC from the perspective of probability density estimation.Then,we illustrate its superior performance in a series of experiments,such as cell clustering.sc DEC also enables the joint analysis of single-cell chromatin accessibility and gene expression.This model performs cell clustering and low-dimensional representation learning of single-cell chromatin data,simultaneously.sc DEC also facilitates the subsequent study on cell trajectory inference and cell regulation mechanism analysis.This study systematically investigates several key problems in the analysis of bulk and single-cell chromatin accessibility data from the perspective of “data integration,information transfer” and also innovatively explores the fundamental problems,such as density estimation towards the interpretation of biological data.The research findings can not only help analyze large-scale chromatin accessibility data efficiently,but also promote a deeper understanding of cell regulation mechanisms and facilitate the effective interpretation of genetic data.
Keywords/Search Tags:chromatin accessiblity, machine learning, multi-omics data, neural network, single-cell
PDF Full Text Request
Related items