Font Size: a A A

Sequence Analysis And Comparative Studies Of The Rice Genome

Posted on:2004-09-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y J ZhangFull Text:PDF
GTID:1103360122971033Subject:Biochemistry and Molecular Biology
Abstract/Summary:PDF Full Text Request
Rice is the staple food for over half of the world population. With its compact genome size (about 430Mb), the well-established techniques for high-efficiency genetic transformation, widespread availability of high-density genetic and physical maps, and high degrees of synteny among cereal genomes, it has been a unique model organism for higher plants. The finished sequence of rice chromosome 4 consists of 287 BACs and 2 PACs with a total length of 34.5Mb in 8 contigs, covering about 97.3% of this chromosome.The value of the genome is only as good as its annotation. It's the annotation that bridges the gap from the sequence to the biology of the organism. Annotation is a process of taking the raw DNA sequence and adding the layers of analysis and interpretation necessary to extract its biological significance and place it into the context of our understanding of biological processes. Genome annotation is a multi-step process, falling more or less into three categories: DNA-level, protein-level and process-level. Rice chromosome 4 is annotated by a combination of automatic predictions, homology search and mainly of all, human curation. Totally 4658 protein-coding genes, 70 tRNAs and 4 snoRNAs are identified. Among these genes, 1681 have unique rice EST matches, and 1004 belong to multi-gene families, mostly local duplication. There are also 2618 genes that have no significant homologys to Arabidopsis proteins, indicating some possible rice- or monocot-specific genes.Japonica and indica are two major subspecies of Oryza sativa. Comparative analysis of homologous chromosome 4 regions of japonica cv. nipponbare and indica cv. GLA4 reveals extensive sequence collinearity including gene orders and contentsbetween them. There are some fluctuations of collinearity along the chromosome: The collinearity around the heterochromatin region is weak, while the conservation in euchromatin region is very high. The differences between them are mainly SNPs and Indels.In the annotation process, we have found the duplicated genes are highly conserved in protein level as well as gene model level between rice and Arabidopsis and often have the same number of exons with identical length of nucleotides. We name this "Conserved Exon Length", or "CEL" rule. It means that we can use the well-annotated Arabidopsis gene models as a reference to facilitate the annotation of rice genomes and vice versa. In addition, it can be used for singletons that are highly conserved in protein level between genomes. Eighty-five percent of Arabidopsis predicted proteins having significant rice homologies indicate the great potential applicability of it. With this rule, we can build a common gene model set for each of the conserved gene group and identify retroposons. We can also use this rule to facilitate the determination of translation start site, or to predict the possible evolutionary route of gene families and to detect very remote homologies.With the CEL rule, we searched the whole Arabidopsis genome and identified 36 retroposons, genes that originated from reverse transcription of mRNA molecules. These genes can be classified into two groups: the first group, totally 20 genes, can be actively transcribed; in the second group, 8 have been integrated into the genome in the very recent history. Among these genes are 23 known genes that involve in the cell cycle, transcription or cellular component transport processes, reflecting of the nature of the retroposition process. Phylogenetic analysis reveals that most of the retroposition events that can be detected now have taken place after the divergence of monocots and dicots. It seems that retroposition process is just a tiny branch of evolution since all of these retroposons accounting for only 0.13% of the Arabidopsis gene inventory.Higher plants have evolved two different mechanisms for iron absorption from the soil, namely chelation mechanism for graminaceous plants and reductionmechanism for all non-graminaceous plants. Detailed comparative analysis reveals that ge...
Keywords/Search Tags:Rice genome, Sequence annotation, Comparative genomics, CEL rule, Retroposons, Superroot
PDF Full Text Request
Related items