An Anchor-based Algorithm For Multiple Genome Alignment

Posted on:2011-06-01

Degree:Master

Type:Thesis

Country:China

Candidate:S C Miao

Full Text:PDF

GTID:2178330332988254

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Multiple genome alignment is one of the most important fundamental subject in modern bioinformatics. To allow a direct comparison of the genome sequences of sufficiently similar organisms, there is an urgent need for software tools that can align more than two genome sequences.However, most of the current research focuses on pairwise genome alignment, and only a few available applications can efficiently align multiple genomes with a low identification efficiency.In this paper, we present an efficient algorithm with improved identification efficiency to align closely related multiple whole genomes, combining suffix arrays, conserved region, graph theoretic formulation and existing tools for gap (short sequence) alignment. Our algorithm first finds a longest increasing subsequence set (LIS) of aligned conserved regions among multiple whole genomes, then aligns the gaps between consecutive conserved regions with ClustalW.We present experimental results for our algorithm and give the analysis of the results. We use six sets of DNA sequences(human, mouse, mycoplasma, etc) in our experiments of multiple sequence alignment. The experiments show that the identification efficiency and time of our algorithm is improved as compared with other methods with comparable accuracy, such as MGA and EMAGEN. This algorithm is also proved feasible and efficient in aligning multiple sequences.

Keywords/Search Tags:

Multiple Sequence Alignments, Conserved Regions, Suffix Arrays, Graph Theoretic Formulation, Longest Increasing Subsequence

PDF Full Text Request

Related items

1	Research On Models And Algorithms For Several Key Problems In Sequence Mining
2	The Research On Algorithms For The Longest Common Subsequence Problem And Variants
3	Explorations on the longest common increasing subsequence problem
4	Algorithms For Sorting By Short Swap And Longest Common Exemplar Subsequence
5	Graph Models And Algorithms Of The Longest Common Sub-sequences For Many Long Sequences
6	The Study Of Heuristics Method For Multiple Sequence Alignments
7	Approximate Longest Common Subsequence Query Processing And Optimization On Biological Sequence
8	Parallel Algorithm For Multiple Longest Common Subsequence And Application Research On Hadoop Platform
9	Research On Protein Multiple Sequence Alignment Algorithms And Assessment Of Their Performance
10	A Simulated Annealing Approach To Multiple Sequence Alignment