Font Size: a A A

Study On Multiple Sequence Alignment And Motif Discovering In Bioinformatics

Posted on:2007-07-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:L F LiuFull Text:PDF
GTID:1100360212459891Subject:Computer applications
Abstract/Summary:PDF Full Text Request
Genome sequencing projects have led to a rapid growth of publicly available databases of genome sequences. Sequence analysis becomes the essential task, and both sequence alignment and motif discovering are two main methods for molecular biological sequence analysis.In this thesis, we focus on the sequence alignment and motif discovering in Bioinformatics. The main works and innovations are as follows.1. To solve the multiple sequence alignment problem in molecular biological sequence analysis, hybrid genetic algorithms are designed. Firstly the SP function is used to measure individual fitness and four genetic operators are designed. Experimental results of the benchmarks from the BAliBASE Ref.1 show that the proposed algorithm is feasible to align the equidistant protein sequences, and the quality of alignment is comparable to that obtained with ClustalX.2. In order to obtain a better solution and higher accuracy, the COFFEE function is used to measure individual fitness, and an associated software package called PHGA-COFFEE are presented. Six genetic operators are designed, especially two novel mutation operators are proposed, one is designed based on the COFFEE's consistency information that can improve the global search ability, and another is realized by dynamic programming method that can improve individuals locally. Experimental results of the 144 benchmarks from the BAliBASE show that the proposed algorithm is feasible. For datasets in twilight zone and comprising N/C terminal extensions, PHGA-COFFEE generates better alignment as compared to other methods studied in this paper. At the same time, the computation time of PHGA-COFFEE is remarkably reduced due to the parallel algorithm.3. To estimate the parameters of profile HMM in multiple sequence alignment, a hybrid genetic algorithm is designed by employing Baum-Welch(BW) algorithm. The quality of the alignments produced by hybrid GA-HMM training is compared to that by the other Profile HMM training methods. The experimental results prove very competitive with and even better than the other tested profile HMM training methods.
Keywords/Search Tags:Bioinformatics, multiple sequence alignment, motif discovering, hidden Markov model, parallel hybrid genetic algorithm, Gibbs sampling
PDF Full Text Request
Related items