Font Size: a A A

Research On Pseudogenes In Rice Genome

Posted on:2006-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:Z H HuangFull Text:PDF
GTID:2133360152494155Subject:Crop Genetics and Breeding
Abstract/Summary:PDF Full Text Request
Rice is one of the most important plant of the world which whole genome sequence has been completed. Pseudogenes are sequences of genomic DNA with such similarity to normal genes that they are regarded as non-functional copies or close relatives of genes. It is one hotspot in genome research. Pseudogene is regarded as biologic fossil in genome evolution and key in research of biologic evolution and dynamic genome because it keeps information of ancient genes. More biology and genetic characters of rice will be uncovered if we use pseudogene into rice genome research, and so we can explore the genetic events of gene replication, gene mutation and insertion, substitution, deletion of DNA sequence to offer more scientific proof for rice planting.By homology search, numbers of pseudogene sequences in genomic DNA can be collected. The normal steps are below: genome DNA sequence collection, six-frame BLAST, FASTA realign, getting rid of false gene data, combining with other pseudogene data, division into processed and unprocessed pseudogene. With the results, Pseudogene can be examined such as population, sequence length distribution, sequence GC-content distribution and homology family etc. In the process, we have used a lot of tools like BLAST, FASTA, CLUSTAL, GO, SIM4, PAML and develop a series of programs based ori Bioperlfor sequence analysis and get different types of pseudogene data.The Genomic DNA sequence of indica.rice and japonica.rice are from Beijing Genomics Institute, Chinese Academy of Sciences. The protein data are collected from international database and translation of full-length cDNA. After all the steps like homology search, Blast, selecting, we have found out 59504 pseudogenes in japonica.rice and 1779924 in indica.rice. There are plenty of pseudogenes in two subspecies of rice(pseudogene/gene ratio above one) and they have many same character. There are 17629 processed pseudogenes and 41875 unprocessed pseudogenes in Japonica.rice. The alike number in Indica.rice are 51964 and 127960. It is obvious that the unprocessed pseudogenes are more than processed pseudogenes. Each chromosome has different numble of pseudogenes and more bigger the chromosome where more genes exist will bring more pseudogenes especially the seventh has much higher pseudogene/gene ratio than others. The two subspecies have similar distribution of sequence length in average and max statistics. 90% pseudogenes lie between in 100 and lOOObp. The processed pseudogenes prefer to short sequences. GC-content distribution of pseudogenes is close to gene which GC-content are always rich. It displays that pseudogenes are not the "junk" sequences in genome as we thought before. When focus on single chromosome, pseudogenes gather between centromere and telomere where rich GC-content genes cluster. Maybe it can discover the hot area of genomic evolution. The most biggest protein homology family is protein enzyme, but others differ in two subspecies. We cant conclude that pseudogenes is the backup of genes to deal with change of environment.
Keywords/Search Tags:rice, pseudogene, genome, bioinfomatics, evolution
PDF Full Text Request
Related items