Font Size: a A A

Identification of tandem repeats: Simple and complex pattern structures in DNA sequences

Posted on:2003-04-11Degree:Ph.DType:Dissertation
University:The University of Wisconsin - MadisonCandidate:Hauth, Amy MichelleFull Text:PDF
GTID:1464390011984611Subject:Computer Science
Abstract/Summary:
Sequence duplication is one process that enables DNA to flexibly adapt and evolve in a changing environment. Duplication creates sequence repetition that over time may mutate to form unique sequence. Sequence repetition present in DNA is interesting biologically in the context of its role in evolution, its association with human congenital diseases and cancer, its occurrence both within genes and to contain genes and its regulatory function. Despite the importance of repetitive DNA, locating and characterizing repetitive patterns within anonymous DNA sequences remains a challenge. In part, the difficulty is due to imperfect pattern conservation and complex pattern structures. This dissertation describes and identifies complex pattern structures associated with tandem repeats and locates non-contiguous regions of similarity associated with interspersed repeats, gene clusters and other dispersed, related sequences.; The difficulty with locating and characterizing tandem repeat regions can be attributed, in part, to the formation of complex pattern structures and imperfect pattern conservation. This research defines a class of regular tandem repeats (RegTRs), as well as, two important subclasses; variable length tandem repeats (VLTRs) and multi-periodic tandem repeats (MPTRs). A tandem repeat identification algorithm locates and characterizes regions having both simple pattern structures and complex pattern structures associated with VLTRs and MPTRs without prior knowledge of the nature of the tandem repetition. Furthermore, the algorithm identifies degenerate MPTRs, VLTRs and regions with simple pattern structures; imperfectly conserved repeats containing substitutions, insertions and deletions.; An extension to the tandem repeat identification algorithm locates similarity between two non-contiguous regions. A proof of concept algorithm locates Alu sequences, long terminal repeats, related tandem repeat regions, distant yet similar genes, gene clusters and other similar features in DNA sequences.; Access to these algorithms is available through a collection of MME-based webpages generated by a companion program. A webpage interface enables a researcher to submit a DNA sequence for analysis. One analysis is complete, the program generates a webpage that displays tandem repeat regions and regions of similarity in the sequence in several forms; as graphic images, as alignments of copies within a tandem repeat region and as tables containing region specific information.
Keywords/Search Tags:DNA, Tandem repeat, Complex pattern structures, Sequence, Simple, Identification
Related items