Font Size: a A A

Integrative Analysis Of Prostate Cancer Biomarkers At Systems Biology Level

Posted on:2014-03-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:J F JiangFull Text:PDF
GTID:1264330431973249Subject:Systems Biology
Abstract/Summary:PDF Full Text Request
ObjectiveAs one of the most common malignancies of men in developed countries, prostatecancer has been the second of death-leading. As far as we know, there are many geneticfactors involved in the incidence of prostate cancer, however, it still remains unclear forthe mechanism of prostate cancer. Therefore, finding the potential biomarkers for prostatecancer is of most urgency. As the development of biotechnology, the raise of large-scalegenotyping has made the discovery of prostate cancer biomarkers, e.g., single nucleotidepolymorphism (SNP), across the whole genome become possible. For example, theGenome-Wide Association Studies (GWAS) normally compare the SNPs of groups ofparticipants: people with the disease (cases) and similar normal people (controls) toexamine if any variant is associated with the trait. The other commonly used method is toapply the microarray technology to extensively screen the differentially expressed genes.These genes will be enriched in Gene Ontology terms and pathways or analyzed innetworks to find the potential high risk genes. However, both of the methods havelimitations. Although GWAS have discovered thousands of risk SNPs, they only explaineda small part the SNPs across the genome. And as reported, most of the SNPs are notlocated in gene coding region, making it difficult to interpret the functions of these SNPs.On the other hand, differential expression analysis is mainly used to discover the gene orgenes that involved in the disease, however, the pathogenesis of these gene on disease isnot clear. For example, how these genes aggregate together or interact with other genes tocontribute the risk to disease is not fully understood.Methods1. Post-GWAS functional characterization of prostate cancer risk locii. We retrieved prostate cancer risk SNPs from GWAS Catalog and obtained allpotential SNPs based on linkage disequilibrium (LD) calculation. ExpressionQuantitative Trait Loci (eQTL) data of lymphoblastoid cell lines (LCLs) and similar tissues were collected from references and public databases.ii. We used ANNOVAR and pre-defined regulatory elements from UCSC databasetracks to functionally annotate the SNPs, and mapped the intergenic SNPs toeQTL data. Corresponding prostate cancer associated genes were collected forsubsequent analysis.iii. Gene Ontology (GO) and pathways were enriched with these prostate cancergenes, as well as transcription regulatory networks construction.2. Top associated SNPs in prostate cancer are significantly enriched in cis-expressionquantitative trait loci (cis-eQTL) and transcription factor binding sites (TFBS)i. We defined top associated SNPs in prostate cancer as those SNPs passed thesignificance level of p<10-3in GWAS. Two GWAS datasets, the Cancer GeneticMarkers of Susceptibility (CGEMS) and the Multiethnic Cohort (MEC) weredownloaded from National Center for Biotechnology Information (NCBI) dbGaPdatabase with approval. eQTL and TFBS data were extracted from public databaseseeQTL and RegulomeDB.ii. Randomization and permutation strategies were applied to examine whether thetop associated SNPs were significantly enriched in eQTL or/and TFBS.iii. Similar enrichment test was carried out on cancer-associated SNPs from GWASCatalog using randomization method.iv. We integrated the analyses of enrichment of eQTL and TFBS to obtain thepotential function SNPs.3. GO assistant co-expression analysis in prostate canceri. Based on prostate cancer gene expression microarray data and GO biologicalprocess terms, we built the sub-expression matrix for each term.ii. We used WGCNA R package to calculate whether the GO_BP term waspreserved between two prostate cancer expression datasets.iii. For each preserved GO_BP term, we constructed and clustered the scale-freenetworks to obtain the co-expression modules.iv. To evaluate the significance of co-expression modules, we calculated theeigengene for each module, and determined whether the modules weredifferentially expressed between cases and controls and preserved between twoprostate cancer datasets. v. Gene sets enrichment analysis was performed on modules identified in the sectioniv, such as genes resulted by eQTLs, or covered by Copy Number Variation (CNV)and Mutation genes.vi. If the modules were significantly enriched in eQTL, and at least in CNV ormutation, we defined them as prostate cancer risk modules, which were furtheranalyzed by transcription factor (TF) and microRNA (miRNA) enrichment. Inaddition, we calculated the TF-trait association based on expression profiles.Results1. Post-GWAS functional characterization of prostate cancer risk lociWe extracted49risk SNPs of prostate cancer from GWAS Catalog, and obtained1828potential SNPs after LD calculation. Results of ANNOVAR annotation showed that of the1828SNPs,8,599,377,4,12,6, and10SNPs were found in exon, intron, ncRNA,5’UTR,3’UTR, upstream, and downstream, respectively, while the rest852SNPs were located inintergenic region. We also found284SNPs located in genomic region of pre-definedUCSC regulatory elements, however, including only86intergenic SNPs. As for eQTLmapping,138intergenic SNPs were successfully interpreted. In total, we compiled a set of205unique PCa risk genes, including41genes from ANNOVAR annotation using UCSCknown genes,151genes by eQTL mapping, and33genes reported by the14GWASpublications. Through GO and pathway analyses, we found that our prostate cancer geneswere significantly enriched in cancer related terms or pathways, such as regulation of celldeath, apoptosis, cell proliferation, etc. We further reconstructed the transcriptionregulatory networks, finding several important genetic regulators for PCa, such asIGF-1/IGF-2receptor, SP1, CREB1, androgen receptor (AR) transcription factors.2. Top associated SNPs in prostate cancer are significantly enriched in cis-expressionquantitative trait loci (cis-eQTL) and transcription factor binding sites (TFBS)We carefully compared the randomization and permutation strategies, and found that forassociation data with a moderate proportion of eQTL SNPs (eSNPs), such as prostatecancer data, randomization would overestimate the eSNPs, leading to a false negative.Therefore, the permutation test is believed to be accurate and estimates a null distributionthat is close to the truth. Our enrichment analysis indicated that top associated SNPs weresignificantly enriched in cis-eQTL and TFBS in Caucasian (CEU) population. However, we did not observe such an enrichment pattern in either African American (AA) orJapanese (JPT) population. This enrichment difference was further validated in the analysisof pan-cancer related SNPs from the GWAS Catalog, indicating the population-specificenrichment pattern of associated SNPs. Moreover, we found two functional SNPs,rs2861405and rs4766642, using a joint enrichment analysis of cis-eQTL and TFBS asapplied to the CGEMS-CEU data.3. GO assistant co-expression analysis in prostate cancer118GO_BP terms were preserved between two expression datasets, GSE17951andGSE6956, with Zsummary>5. For each of the118important GO_BP terms, we clustered thehighly co-expressed genes into a module with a given color. As a result, we identified atotal of548modules, of which294modules were significantly associated with prostatecancer (p <0.05). As measured by preservation statistics in GSE6956, we discovered55preserved modules with Zsummary>5. We further performed gene set enrichment analysis onthese55modules, using eQTL, CNV, and Mutation genes, and obtained5risk prostatecancer modules, named M1~M5. TF enrichment analysis showed M1and M3wereregulated by NFAT, while M2, M4, and M5were enriched in SP1. Analysis of miRNAindicated several factors, such as has-miR-19a for M1and M3, has-miR-15a for M4andM5, and has-miR-200b for M2.Conclusions1. We performed a comprehensive integrative analysis for prostate cancer GWAS SNPs atsystems biology level, including GO enrichment, pathway enrichment and networkconstruction, providing informative insights for functional investigation of prostate cancerassociated SNPs, especially those located in intergenic region.2. We revealed a population-specific regulation pattern for top associated SNPs in prostatecancer. The prostate cancer risk SNPs in CEU may act through cis-regulators in theexpression of target genes, such as eQTL and TFBS, which has not been observed in AAand JPT populations yet.3. We analyzed the prostate cancer co-expression modules based on GO knowledge, andanswered questions as followed,1) which GO terms were associated with prostate cancer?2) Which genes within the term were co-expressed together?3) Which co-expression modules were associated with prostate cancer?4) Which modules were significantlyenriched in prostate cancer risk gene sets?5) Which genetic factors regulated the riskmodules?...
Keywords/Search Tags:prostate cancer, genome-wide association studies, single nucleotidepolymorphisms, expression quantitative trait loci, transcription factor binding site, geneontology, pathway, network, co-expression
PDF Full Text Request
Related items