Font Size: a A A

Identification of Non-Random Somatic Mutation Clustering While Accounting for Protein Tertiary Structure: Extensions, Novel Methodologies and Applications to Identifying Oncogenic Driver Mutations

Posted on:2015-04-04Degree:Ph.DType:Thesis
University:Yale UniversityCandidate:Ryslik, Gregory AFull Text:PDF
GTID:2474390017498462Subject:Biology
Abstract/Summary:PDF Full Text Request
Human cancer, defined as uncontrolled and unregulated cellular division, is known to be caused by the accumulation of somatic mutations in the genome. With respect to oncogenes, current theory suggests that there exist only a few key "activating" (or "driver") mutations which are responsible for tumorigenesis. Further, many of these driver mutations are caused by missense substitution mutations as the more radical frame-shift, insertion and deletion mutations are significantly more likely to simply result in loss-of-function or protein death once the final polypeptide chain is synthesized.;Recently, a large variety of methods have been developed to identify possible driver mutations in order to determine potential pharmacological targets. Many of these methods leverage the hypothesis that activating mutations only occur on a few key positions resulting in mutational "clusters". We present an improvement to these cluster finding algorithms by accounting for the protein tertiary structure. By combining the mutational information available in the Catalogue of Somatic Mutations in Cancer (COSMIC) with the protein structure information available in the Protein Data Bank (PDB), we are able to increase our ability to detect mutational clustering, and hence, potential activating mutations.;We first present iPAC, which accounts for protein tertiary structure by remapping the protein down to one dimensional space via Multi-Dimensional Scaling (MDS). The linear NMC algorithm is then run to identify clusters on the remapped protein. We show that this methodology identifies several oncogenic proteins, such as EGFR and EIF2AK2, which are missed when the protein tertiary structure is not considered. We also show that iPAC identifies several novel clusters in proteins previously known to contain clustering, such as KRAS and PIK3Calpha. The next algorithm, GraphPAC, utilizes a graph theoretic approach for remapping the protein into one dimensional space. By utilizing graph theory, GraphPAC is able to avoid the global remapping performed by MDS and instead considers only local amino acids during the remapping phase. This approach is more reflective of the local topology of proteins, namely that protein domains are connected by domain linkers and that one may not want residues in different domains to have an effect on each other's final position after remapping. We show that GraphPAC identifies clustering within several known oncogenic proteins, such as DPP4 and NRP1, which are otherwise missed. Our third and final method, SpacePAC, employs a novel simulation approach that identifies mutational clusters directly in 3D space. By considering the protein directly in its natural topology, as well as avoiding the multiple comparison adjustment required by NMC, iPAC and GraphPAC, we are able to increase power and once again identify novel results, especially at more stringent significance levels. In addition to identifying new proteins that contain mutational clusters, such as CHRM2 and FGFR3, we show that SpacePAC often yields improved cluster localization.
Keywords/Search Tags:Protein, Mutations, Somatic, Novel, Clustering, Driver, Mutational, Clusters
PDF Full Text Request
Related items