Font Size: a A A

Characterizing sequence features of human pathogenic variations and impact on post-transcriptional regulation

Posted on:2017-01-03Degree:Ph.DType:Thesis
University:Indiana UniversityCandidate:Zhang, XinjunFull Text:PDF
GTID:2463390014974112Subject:Bioinformatics
Abstract/Summary:
Some diseases, it is now known, can be understood through a genome-wide association by identifying genomic regions of disease whose code differs from non-disease. Interestingly, the substitution of a single nucleotide or multiple nucleotides insertion/deletion can seemingly indicate disease. Distinguishing genetic variants that are pathogenic from benign ones is one of the primary challenges in genetic and next generation sequencing studies. Several computational tools have been developed to predict the effects of coding non-synonymous variants on protein structures and functions, i.e. SIFT, PROVEAN and Polyphen 2. Recently studies have revealed that synonymous variants and micro-insertion/deletions can affect alternative splicing process and cause exon skipping, and also play a vital role in various diseases. However, the lack of efficient algorithms in prioritizing such two types of variants has greatly hindered the discovery of disease causative variants in an era of ever-increasing complexity in genetic studies.;In my thesis, I have investigated the mechanisms by which synonymous variants and INDELs affect alternative splicing and subsequently lead to diseases. To prioritize synonymous variants, a computational algorithm was developed based on their impact on alternative splicing, protein structure and function. In addition to genomic features regarding splicing regulation, my algorithm also includes dozens of structural features that characterize the functional impact of alternatively spliced exons on protein function. Small insertions/deletions (INDELs) of ≤ 21 bp comprise 18% of all recorded mutations causing human inherited disease and are evident in 24% of documented Mendelian diseases. The potential of INDELs to affect binding-site affinity of RNA-binding proteins (RBPs), sequence features and also protein function were evaluated were identified that can distinguish disease-causing INDELs from non-pathogenic INDELs. For both categories of variants, web accessible tools were also developed to ascertain which newly observed variants are likely to be pathogenic. Accurately prioritizing deleterious genetic variation can significantly advance our understanding of disease etiology.
Keywords/Search Tags:Disease, Pathogenic, Variants, Features, Impact, Genetic
Related items