Font Size: a A A

Research On Gene Structure Annotation Using Genomically Aligned EST Sequences

Posted on:2008-04-27Degree:MasterType:Thesis
Country:ChinaCandidate:H JingFull Text:PDF
GTID:2120360272969669Subject:Bio-IT
Abstract/Summary:PDF Full Text Request
Identification of protein coding genes is a crucial issue of genome research. As genomes of more and more species have been sequenced, the issue is becoming particularly important. Traditional biological experiments could hardly tackle the whole problem with the explosion of genomic sequences, which makes the high throughput methods of bioinformatics invaluable.An EST (expressed sequence tag) is a partial sequence of a clone picked at random for cDNA library. In theory, an EST contains no introns and represents part of a gene. The amount of ESTs is very large and is growing fast. It is an extremely precious resource. Prediction and annotation of protein coding genes using genomically aligned ESTs is a crucial issue. However, this is not a trifle due to the poor quality of EST sequences and the complexity of genome.Characteristics of EST sequences have been studied at first. Several factors concerning the quality of EST sequences have been investigated, including foreign sequences, genomic DNA sequences, chimeric sequences, pre-mRNA sequences, random-primed sequences, internal-primed sequences and so on. Components of the genome have also been studied, such as repeats, pseudo genes, multi-copy genes, overlapping genes, nested genes and alternative splicing.Based on these analyses, several measures have been brought up according to different cases involved in the alignments between EST sequences and the genome. The annotation procedures include trimming foreign sequences, locating ESTs on the genome, verifying and testing the alignments, clustering between each other and predicting the final gene structure. At the prediction step, directed acyclic graph (DAG) algorithm and expectation-maximization (EM) algorithm were applied to predict and sort out alternative spliced transcripts according to their calculated probability. Evaluation of the results proves that the procedures are effective. The gene annotation platform that covers the whole human genome has been set up. An established large database which contains more than 60 million entries is now supporting related web services at http://bioinfo.hust.edu.cn.
Keywords/Search Tags:EST, sequence alignment, genome, gene prediction, gene annotation
PDF Full Text Request
Related items