| One of major challenges and opportunities of biomedical big data derived from high throughout sequencing is how to distill meaning from these data.In this dissertation,we focused on developing multi-step computational strategies to integrate multi-omics information for a better translation of genomic data to knowledge of tumorigenesis and further to application of clinical therapy.This dissertation consists of four chapters.In Chapter 1,we provide an overall introduction to big data analytics and techniques in biomedical research.In Chapter 2,we specially developed a simulator,RRBSsim,for benchmarking analysis of RRBS data.We performed extensive comparison of seven mapping algorithms for methylation analysis in both real and simulated RRBS data.Our empirical and simulated evaluation found that methylation results were less consistent between software tools for CpG sites with low sequencing depth,medium methylation level,on CGI shores or gene body.Among the software tools tested,bwa-meth and BS-Seeker2(bowtie2)are currently our preferred aligners for RRBS data in terms of recall,precision,and speed.Existing aligners cannot efficiently handle moderately methylated CpG sites and those CpG sites on CGI shores or gene body.Our study reveals several important features inherent in methylation data,and RRBSsim provides guidance to advance sequence-based methylation data analysis and methodological development.In Chapter 3,we analyzed genome-wide DNA methylation profiling using Reduced Representation Bisulfite Sequencing(RRBS)and RNA-Seq in 18 tumors and matched normal tissues from non-small cell lung cancer(NSCLC)patients.We highlighted twelve known(e.g.CDO1,SLIT2 and TCF21)and eight novel(e.g.PCDH17 and IRX1)methylation-driven genes that is a potential biomarker set using an extensive validation of TCGA cohort.We also validated the eight novel methylated genes using pyrosequencing in an independent cohort.Our results discovered a number of novel methylated genes and provided novel insight into the connection between DNA methylation and transcription regulation in NSCLC and served as a resource for future molecular studies.In Chapter 4,Transfer RNA-derived RNA fragments(tRFs),a novel class of small non-coding RNAs,are abundant in many organisms,yet their role in cancer remains largely unexplored.Here,we report a functional genomic landscape of tRFs in 8,118 specimens across 15 cancer types from The Cancer Genome Atlas.These tRFs exhibited characteristics of widespread expression,high sequence conservation,cytoplasmic localization,specific patterns of tRNA cleavage,conserved cleavage in tissues,and cell-lineage specificity.A cross-tumor analysis revealed significant commonality among tRF expression subtypes from distinct tissues of origins,characterized by upregulation of a group of tRFs with similar size and activation of cancer-associated signaling.One of the largest superclusters was composed of 22 nt 3’-tRFs upregulated in 13 cancer types,all of which share the activation of Ras/MAPK,RTK and TSC/mTOR signaling.tRF-based subgrouping provided clinically relevant stratifications and significantly improved outcome prediction by incorporating clinical variables.Additionally,we discovered 11 cancer super-driver tRFs using an effective approach for accurately exploring cross-tumor and platform trends.As a proof of concept,we performed comprehensive functional assays on a non-microRNA super-driver,5’-IleAAT-20,and validated its oncogenic roles in lung cancer in vitro and in vivo.Our study provides a valuable tRF resource for identifying diagnostic and prognostic biomarkers,developing cancer therapy and studying cancer pathogenesis. |