Font Size: a A A

Modeling And Algorithm Of Whole Genome Structural Variation Problems

Posted on:2019-03-12Degree:MasterType:Thesis
Country:ChinaCandidate:S S ZhaiFull Text:PDF
GTID:2370330542996933Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Genome rearrangement is the behavior that genome changes the sequence of genes,which can be formalized as several basic operations such as reversal,translocation,transposition and so on.Genome rearrangement is also a typical structural variation that leads to the evolution of life characteristics.The sorting of genome rearrangement,which requires to find the shortest sequence of rearrangement operations,transforms one genome into another genome.This is a combinatorial optimization model to infer the evolution of life and compare genetic relationship.The sorting of genome rearrangement that each gene in the genome appears only once has the extensive and deep research progress.Berman et al.presented an algorithm for the problem of sorting unsigned genomes with approximate performance ratio of 1.375,which is currently the best.Hannenhalli et al.gave a polynomial algorithm of sorting signed genomes by reversals at first,Bader and others designed an algorithm to calculate the reversal distance with time complexity of O(n),and gave an algorithm to obtain a new sequence after reversals whose time complexity is O(n2).For the sorting problems on signed genomes by translocations,Li et al.proposed an algorithm to calculate translocation distance with time complexity of O(n).For the sorting problems on unsigned genomes by translocations,Cui et al.designed an 1.75-approximate algorithm with time complexity of O(n2),and the 1.375-approximation algorithm proposed by Pu is the best algorithm to solve this problem at present.Genes are often repeated many times in the genome.However,there is still a lack of effective combinatorial optimization problem models and algorithms for the comparison of whole genomic structural differences with gene duplication multiple times.This paper establishes a combinatorial optimization problem model to describe the rearrangement distance between two genomes with duplicate genes.The problem of sorting by reversals on two genomes with duplicate genes can be described as a combinatorial optimization problem as follows:given a source gene sequence ? and a target gene sequence ?,two gene sequences have same genes,the goal is to output a new gene sequence ?' by a certain number of reversal operations on gene sequence,eliminate all the breakpoints between two gene sequences ? and ? and ensure that the number of reversals is minimal.We always represent a gene sequence as a character sequence,one character represents a gene.This paper presents two algorithms to solve the problem of sorting gene sequence by reversals with duplicate genes.(1)Give a 4-approximate algorithm with the time complexity of O(n3).We define block,samelocationblock,oppositelocationblock using the concepts of breakpoint and adjacency.Make classified discussions on the differences between samelocationblock and oppositelocationblock by different reversal operations,design specific reversal operations on the breakpoints except samelocations and oppositelocations.Give an approximate algorithm to eliminate all the breakpoints in the gene sequence,and prove that every two reversal operations can eliminate at least one breakpoint,thus proving that the approximate performance ratio of the algorithm is less than 4.(2)Define the breakpoint graph that expresses the reversal distance of two genomes with duplicate genes.Give a precise algorithm for the sorting problems on genomes with duplicate genes when the breakpoint graph is Non-Cross Graph.Analyze three simple structures of Non-Cross Graph,do any reversals on three simple structures of Non-Cross Graph and make classified discussions according to the different positions of reversal substrings,give the upper bound of reversal distance represented by the parameters of Non-Cross Graph.Prove that any one of Non-Cross Graphs can be reduced into the simple structure of Non-Cross Graph.Thus proving that the previous upper bound of reversal distance can be applied in any one of Non-Cross Graphs.We obtain the reversal distance by the accurate algorithm that equals to the previous upper bound,so the algorithm can obtain the optimal reversal distance of the sorting problems by reversals on genomes with duplicate genes.There are three main innovations of this paper:1.Build a problem model of genome rearrangement with duplicate genes.Before this,people studied the problems of genome rearrangement on genomes with no duplicate genes.2.Find and define 'block' in the gene sequence,and design a 4-approximate algorithm for the sorting problems of genome rearrangement by reversals.3.Build a breakpoint graph model representing the reversal distance between two genomes with duplicate genes,give a polynomial accurate algorithm on the sorting problems of genome rearrangement by reversals representing by Non-Cross Graph,then give the proof of optimality.
Keywords/Search Tags:Reversal, Adjacency, Breakpoint, Block, Non-Cross Graph
PDF Full Text Request
Related items