Identification of Non-Random Somatic Mutation Clustering While Accounting for Protein Tertiary Structure: Extensions, Novel Methodologies and Applications to Identifying Oncogenic Driver Mutations

Posted on:2015-04-04

Degree:Ph.D

Type:Thesis

University:Yale University

Candidate:Ryslik, Gregory A

Full Text:PDF

GTID:2474390017498462

Subject:Biology

Abstract/Summary:

PDF Full Text Request

Human cancer, defined as uncontrolled and unregulated cellular division, is known to be caused by the accumulation of somatic mutations in the genome. With respect to oncogenes, current theory suggests that there exist only a few key "activating" (or "driver") mutations which are responsible for tumorigenesis. Further, many of these driver mutations are caused by missense substitution mutations as the more radical frame-shift, insertion and deletion mutations are significantly more likely to simply result in loss-of-function or protein death once the final polypeptide chain is synthesized.;Recently, a large variety of methods have been developed to identify possible driver mutations in order to determine potential pharmacological targets. Many of these methods leverage the hypothesis that activating mutations only occur on a few key positions resulting in mutational "clusters". We present an improvement to these cluster finding algorithms by accounting for the protein tertiary structure. By combining the mutational information available in the Catalogue of Somatic Mutations in Cancer (COSMIC) with the protein structure information available in the Protein Data Bank (PDB), we are able to increase our ability to detect mutational clustering, and hence, potential activating mutations.;We first present iPAC, which accounts for protein tertiary structure by remapping the protein down to one dimensional space via Multi-Dimensional Scaling (MDS). The linear NMC algorithm is then run to identify clusters on the remapped protein. We show that this methodology identifies several oncogenic proteins, such as EGFR and EIF2AK2, which are missed when the protein tertiary structure is not considered. We also show that iPAC identifies several novel clusters in proteins previously known to contain clustering, such as KRAS and PIK3Calpha. The next algorithm, GraphPAC, utilizes a graph theoretic approach for remapping the protein into one dimensional space. By utilizing graph theory, GraphPAC is able to avoid the global remapping performed by MDS and instead considers only local amino acids during the remapping phase. This approach is more reflective of the local topology of proteins, namely that protein domains are connected by domain linkers and that one may not want residues in different domains to have an effect on each other's final position after remapping. We show that GraphPAC identifies clustering within several known oncogenic proteins, such as DPP4 and NRP1, which are otherwise missed. Our third and final method, SpacePAC, employs a novel simulation approach that identifies mutational clusters directly in 3D space. By considering the protein directly in its natural topology, as well as avoiding the multiple comparison adjustment required by NMC, iPAC and GraphPAC, we are able to increase power and once again identify novel results, especially at more stringent significance levels. In addition to identifying new proteins that contain mutational clusters, such as CHRM2 and FGFR3, we show that SpacePAC often yields improved cluster localization.

Keywords/Search Tags:

Protein, Mutations, Somatic, Novel, Clustering, Driver, Mutational, Clusters

PDF Full Text Request

Related items

1	Genetic Determinants Of The Somatic Mutational Processes In Cancers Reveal Potential Driver Genes Of Cancer Evolution
2	Study Of Methods Of Driver Mutation Clusters Identification Based On Analysis Of Multiple Cancer
3	Study Of The Method Of Mining The Patterns Of Driver Mutation In Pan-cancer
4	Somatic Mutations Reveal The Pathogenesis Of Gastric Cancer Induced By The Risk Factors Exposure
5	The Landscape Of Somatic Mutations And The Functions Of Major Mutant Genes In Acral Melanoma
6	Mutational Profiling Of A Long-term Surviving Stage Ⅲ Colorectal Cancer Patient Using High Throughput Next-Generation Sequencing
7	Development Of Methods For Melanoma Immune-molecular Profiling And Identification Of Tumor-associated Mutation Site
8	Mitochondrial DNA somatic mutations and autoimmunity
9	Computational assessment of somatic missense mutations detected in tumor sequencing studies with cancer-specific high-throughput annotation of somatic mutations (CHASM)
10	Study On The Correlation Between NAP1L1 D349E Somatic Variant And Hypertrophic Cardiomyopathy And Risk Gene Somatic Variant In Aortic Dissectio