Font Size: a A A

Combinatorial and statistical approaches for some challenges in oligonucleotide fingerprinting on ribosomal RNA genes

Posted on:2007-04-30Degree:Ph.DType:Dissertation
University:University of California, RiversideCandidate:Liu, ZhengFull Text:PDF
GTID:1440390005964832Subject:Biology
Abstract/Summary:
Oligonucleotide fingerprinting of ribosomal RNA genes (OFRG) is a high-throughput, cost-effective, array-based method designed to identify microorganisms. During the development of the OFRG method, various computational challenges have arisen. In this work, we present some combinatorial and statistical solutions for several critical problems in OFRG.; The first problem is a sequence acquisition problem whose goal is to obtain an rRNA gene sequence database for a specific taxonomic group. In the proposed combinatorial approach, a fast and accurate approximate string-matching algorithm was designed to fetch rRNA gene sequences sandwiched by two given primers from GenBank. A homology search algorithm, which combines a chaining algorithm with the Basic Local Alignment Search Tool (BLAST), was then used to extract rRNA gene sequences that do not contain the primers. An improved string-matching algorithm, called Fast Algorithm for Approximate String maTching (FAAST), was further developed for the approximate string-matching problem. FAAST generalizes the well-known Tarhio-Ukkonen algorithm by requiring two or more matches when calculating shift distances. Both theoretical analysis and experimental results demonstrate a significant speed-up without loss of accuracy achieved by the algorithm.; The second challenge arises in the analysis of microarray data. In OFRG, the presence of specific rRNA gene sequences are determined by the intensity values of hybridization with a series of oligonucleotide probes. Due to noise and technological limitations, these intensity values are sometimes too ambiguous for a reliable classification. In such a situation, the traditional Bayes classification method could lead to an invalid prediction, affecting the accuracy of OFRG. A statistical model called Modified Bayes Rule (MBR) was proposed to allow a "no prediction." MBR formulated a cost structure to weigh the penalty for not making a definite prediction against that for making an incorrect definite prediction. Experiments demonstrated that MBR outperforms a neutral-zone rule that has been routinely used before in OFRG.; Finally, software packages that implement the above algorithms and other related methods were developed. A central database was also designed to serve as the central management of data from OFRG.
Keywords/Search Tags:OFRG, Gene, Method, Designed, Combinatorial, Statistical
Related items