Font Size: a A A

Genome Rearrangement For Partially Ordered Genomes

Posted on:2012-08-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:H Y ChenFull Text:PDF
GTID:1480303353952959Subject:Operational Research and Cybernetics
Abstract/Summary:PDF Full Text Request
Genome rearrangement is an important research area of computational biol-ogy and bioinformatics. Many research results show that genome rearrangement is a common mode for biological evolution and one main reason for the diversities of plants, mammals and bacteria.The analysis of genome rearrangements in molecular biology was pioneered by Dobzhansky and Sturtevant,1938, who published a milestone paper presenting a rearrangement scenario with 17 inversions for the species of Drosophila fruit fly. In the simplest form, rearrangements can be modeled by using a combinatorial problem of finding a shortest series of reversals to transform one genome into another. With the advent of large-scale mapping and sequencing, the number of genome comparison problems is rapidly growing in different areas, including viral, bacterial, yeast, plant, and animal evolution. In the late 1980s, Jeffrey Palmer and his collegues discovered a remarkable and novel pattern of evolutionary change in plant organelles. They compared the mitochondrial genomes of Brassica oleracea and Brassica campestris, which are very closely related (many genes are 99% identical). To their surprise, these molecules, which are almost identical in gene sequences, differ dramatically in gene order. This discovery and many other studies in the last decade convincingly proved that genome rearrangements represent a common mode of molecular evolution [49].A genome is a set of chromosomes and a chromosome is a sequence of genes each represented by an integer. Each gene has a direction. When the direction of each gene in a signed genome is known, we use a signed integer to indicate the gene. When the direction of each gene in an unsigned genome is unknown, we use an unsigned integer to represent the gene. Given two genomes, the problem of genome rearrangement is to compute a shortest sequence of rearrangement operations that transforms one genome into the other. For the problem of sorting signed genomes, a rearrangement operation can change both the gene sequence and the signs of genes. For the problem of sorting unsigned genomes, a rearrangement operation can only change the gene sequence, but cannot change the signs of genes.Although genome rearrangement is a complicated process, there are several basic operations:reversal, translocation and transposition. Simply, when we con-sider the genome rearrangement, we always assume that there are no repetitive genes in the genome. Thus, a genome with n genes can be represented by a per-mutation on{1,2,..., n}. If a genome can be transformed into another genome only by translocations, the two genomes must have the same gene content. Of course, such is rarely in biological practice. So we consider the general case when the two genomes have different gene contents. Clearly, in such case, deletions or insertions are needed. Our goal is still to transform one genome into the other genome with a shortest of rearrangement operations (translocations, deletions or insertions).Most comparative genomics studies assume implicitly that the total order of genes on a chromosome has been identified. However, because of the lack of resolution, current genetic mapping techniques such as recombination analysis and physical imaging often generate gene maps that have several genes at the same position and/or are missing some other genes. Thus, combining all the gene maps only suffice to produce a partial order rather than a total order.This thesis mainly studies the genome rearrangement for partially ordered genomes. We construct our work in four chapters. In Chapter One, we introduce the basic concepts and algorithms of genome rearrangement.The objective of Chapter Two is to deal with the partially ordered break-point distance (PBD) problem, that is, given two partially ordered genomes?and r with the same gene content, we need to find a possible total order L(?) and L(?), respectively, such that the breakpoint distance db(?,?) between L(?) and L(?) minimized. In this chapter, we present a heuristic algorithm to compute the breakpoint distance between two partially ordered genomes, which runs in O(n4) worst-case time, where n is the number of genes in the considered genome. Firstly, we introduce a new combinatorial optimization problem-Minimum Double Break-point Vertex Set (MDBVS) problem and give a 2(m2+4m-3)-approximation algorithm for this problem, where m is the maximum number of gene maps that are combined together to form the two partially ordered genomes, respectively. Then we present a heuristic algorithm for the PBD problem based on the relation-ships between them. The following theorem establishes the relationships between the PBD problem and MDBVS problem.Theorem Let |MDBVS| be the size of the minimum double breakpoint vertex set,|W| be the size of W, n be the size of genes in?. Then db(?,?)= n-1-|W|+|MDBVS|.If |W|<n, where W denotes the set of possible commom adjacencies, the above theorem implies that a 2(m2+4m-3)-approximation algorithm for the MDBVS problem can be easily translated into a 2(m2+4m-3)-approximation algorithm for the PBD problem.In computational biology, if a group of homologous genes (that are genes hav-ing a common ancestry) is co-localized in two different species, then those genes were probably together in the common ancestor and were not later separated dur-ing evolution. Such conserved clusters of homologous genes are called common intervals [32]. Recently, another combinatorial framework for genome rearrange-ment was proposed, called perfect sorting problem. It is infer a minimum number of rearrangement operations which transform one genome into another genome and don't break any common interval of the considered genome.In Chapter Three, we extend Berard's [5] algorithm of solving the problem of perfect sorting by reversals to allow deletions and a limited form of insertions (which forbids duplications). In Chapter Four, we study the problem of genome rearrangement for partially ordered genomes which have different gene contents, that is, given two partially ordered genomes that have different gene contents, we need to find a possible total order for them, respectively, such that the number of rearrangement operations which transform one into the other is minimized. Obviously, in such case, inser-tions or deletions are needed. In this chapter, we extend Zheng's algorithm [70] and present an algorithm to solve the partially ordered rearrangement problem, allowing deletion and insertion operations, and illustrate the feasibility of our algorithm.
Keywords/Search Tags:genome rearrangement, breakpoint distance, reversal, dele-tion, insertion, algorithm
PDF Full Text Request
Related items