Font Size: a A A

Research On Analysis Methods Of Cancer Biological Pathway Based On Omics Data

Posted on:2020-07-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q S ZhangFull Text:PDF
GTID:1364330590972789Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Cancer is one of the most lethal diseases.By 2030,deaths caused by cancers are estimated to reach 13 million per year worldwide.Emergence of high-throughput technologies,such as microarrays and next-generation sequencing,has motivated the investigation of cancer cells on a genome-wide scale.An important application of highthroughput molecular profiling technologies is biomarker discovery,which is the identification and measurement of intrinsic features of the disease that can help clinical decision-making.Despite the success of its use,gene biomarkers have not been exempt of problems.Specifically,one major drawback of multi-gene biomarkers is that they often lack proper interpretation in terms of mechanistic link to the fundamental cell processes responsible for disease progression or therapeutic response.The other major drawback is that these gene signatures have been challenging to reproduce,particularly in heterogeneous diseases such as cancer.Given this difficulty in reliable selection of genomic features of relevance to a clinical question,it is desirable to augment purely data-driven approaches with some a priori knowledge of cancer biology to increase the likelihood of discovering robust biomarkers.To overcome these drawbacks,therapies targeting specific pathways have been developed.The use of computational pathway analysis methods and genomic data can help guide the use of targeted therapies by assessing which pathways are deregulated in patient subpopulations and individual tumors.They allow us to gain insights into the functional mechanisms involved in cancer mechanism.However,most pathway analysis methods do not account for complex interactions inherent to signaling pathways,and are not capable of integrating different types of genomic data(multiomic data).To address these limitations,this dissertation focuses on network-based pathway analysis and pathway-based cancer diagnosis.The dissertation firstly introduces the main methods in pathway analysis and summarizes the challenges and research progresses.To overcome these challenges,some algorithms are proposed.The main contents of dissertation are as follows:(1)With the establishment of large-scale biological networks,network-based pathway analysis has become a research hotspot.The interaction of pathways is not limited to the interaction of genes within pathways.In genome-wide biological networks,there are extensive interactions between internal and external genes of the pathway and between pathways.Accordingly,a pathway analysis method based on weighted gene interaction network is proposed.First,we integrate protein-protein interaction(PPI)information,gene expression profile data and pathway databases into the pathway analysis and constructed two whole-genome level gene-gene interaction networks.Then,we extend pathways based on Limited K-walks algorithm into two small networks in two weighted networks(Case and Control).Finally,we score the pathways corresponding to the gene expression profile data based on the correlations of these two small networks to identify significant pathways.On the public datasets,the proposed method is compared with other methods.The experimental results show that the method can effectively identify the cancer-related pathways.(2)The rapid accumulation of multi-omics data provides powerful support for revealing cancer pathogenesis.Attempts to understand disease mechanism have benefited greatly from epigenetics and transcriptomics studies.Accordingly,a pathway analysis method based on network for integrating multi-omics data is proposed.First,the edge weight between a gene pair is calculated according to the PCA and SCCA through integrating DNA methylation and gene expression data.Then,each pathway is extended based on the limited kWalks algorithm in weighted phenotype-specific networks.Finally,by inputting the gene lists of extended pathways into the classical gene set analysis,we identify altered pathways which are correlate well with the corresponding cancer.The method is evaluated on three datasets.The results show that the integration of DNA methylation and gene expression data through a network of known gene interactions is effective in pathway analysis.In addition,this method can be used to investigate the crosstalk between pathways based on large-scale biological networks,which provides a new perspective for studying the role of pathways in cancer from a systematic perspective.(3)Traditional medicine has changed to personalized and precise medicine with the introduction of personalized medicine and precision medicine.With the development of personalized pathway analysis methods,cancer research based on personalized pathways has become one of the hotspots of recent research.Firstly,the effects of three types of pathways for cancer classification are compared and analyzed.The experimental results show that the classification based on OR-pathway has the best performance.Then,a risk pathway model based on personalized pathway is constructed and applied to the breast cancer dataset.The experimental results show that the method can effectively identify the breast cancer-related pathways.(4)Omics data are typically characterized by high dimensionality,relatively small sample sizes and high noise.These characteristics are easy to cause problems such as dimensionality disaster and over-fitting in data mining,which makes many classic machine learning methods lose their performance.Accordingly,a robust ensemble learning paradigm,which incorporates pathway information,is proposed to predict cancer classification.Firstly,we select differentially expressed genes of each pathway to generate a group of base learners through training SVM.Then,the optimization algorithm is used to filter the base classifiers.Finally,the optimal base classifier set is selected to construct the integrated learning classifier for cancer diagnosis.On three public datasets,the proposed method is compared with other methods.The results show that the proposed method has the higher performances on most metrics and robust performance.In addition,the base classifiers in ensemble learning have clear biological significance.Some core biological pathways and biological process underlying clinically-relevant phenotypes are identified by function annotation.Overall,our research can provide a new perspective for the further study of molecular activities and manifestations of cancer.
Keywords/Search Tags:pathway analysis, personalized pathway, ensemble learning, interaction network, extended pathway
PDF Full Text Request
Related items