Algorithms for characterizing structural variation in human genome

Posted on:2011-08-02

Degree:Ph.D

Type:Thesis

University:Case Western Reserve University

Candidate:Yavas, Gokhan

Full Text:PDF

GTID:2443390002464568

Subject:Biology

Abstract/Summary:

PDF Full Text Request

Until fairly recently, single nucleotide polymorphisms (SNPs) were thought to be the main source of variation in the human genome. With the advent of high-throughput genome scanning technologies, it has been revealed that there are other forms of genomic variation beyond single base-pair substitutions. These structural alterations include insertions, deletions, inversions, translocations, tandem repeats of DNA sequences and copy number variants (CNVs). Concisely, all of these alterations are referred as structural variations.;CNVs represent the segments of the genome that are polymorphic with regard to genomic copy number. Copy number polymorphisms (CNPs), which can be considered as a specific category CNVs, are defined to be copy number variants that are present, with identical boundaries (and are therefore likely identical-by-descent), in at least 1% of the human population. Tandem repeats, on the other hand, are described as serially repeated segments of the human genome which may have repeat units several hundred kilobases in size.;CNVs, which have been shown to have a role in various diseases such as Alzheimer disease, Crohn's disease, autism and schizophrenia, can be caused by various structural mutations such as duplications and deletions. In the effort to scan the entire genome of human populations, as well as individuals, for CNVs (also CNPs) and tandem repeats, SNP arrays and paired end sequence mapping data have emerged as important tools.;In this thesis, we study the problem of identifying CNVs, CNPs and tandem repeats from these data sources. We first frame CNV identification as an optimization problem with an objective function that is explicitly designed so that its optimal solution is the most accurate set of CNV calls. Our method, termed COKGEN, finds the best solution using a variant of the well-known heuristic simulated annealing. Next, we present a method for identifying and genotyping common CNPs. The proposed method, POLYGON, draws strength from multiple samples to produce copy number genotypes of the samples at each CNP and fine-tune its boundaries. Finally, we present a novel graph theoretical method for determining the tandem repeats from paired-end read data obtained from massively parallel paired-end sequencing of the target genome.

Keywords/Search Tags:

Genome, Human, Tandem repeats, Variation, Structural, Method

PDF Full Text Request

Related items

1	Undertanding The Contribution Of Tandem Repeats Togenetic Variations In Bumblebee Genome
2	Population Genetic Study Of Porcine Whole-genome Short Tandem Repeats And Their Effects On Gene Expression In Liver Tissues
3	Characterization And Functional Annotation Of The Genes Containing Tandem Repeats In B. Oleracea, B. Rapa And B.Napus
4	Distribution Of Tandem Repeats In Genomes Of Wheat And Related Species And Its Application For Chromosome Identification
5	Analysis Of Tandem Repeats In Crab Portunus Trituberculatus Genome And Microsatellite Marker Screening
6	Study On Structural Features And Phylogenetic Evolution Of Repetitive Sequences In Bovinae Genome
7	The Distribution Characteristics Of Different Fragments Of Tandem Repeats On Triticeae Chromosomes And Identification Of Dasypyrum Villosum Chromosomes In Wheat Backgrounds
8	Study On The Expression Difference Of 1RS Specific Genes And Tandem Repeats Of 1RS.1BL Translocation Chromosome
9	Tandem Repeats Analysis Of Yesso Scallop Genome
10	Computational analysis on genomic variation: Detecting and characterizing structural variants in the human genome