Font Size: a A A

RNA-seq Data Analysis And The Study Of Optimum Length Of Minimal Intron

Posted on:2012-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:J WuFull Text:PDF
GTID:2180330467989017Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
With the rapid development of the next generation sequencing, we can get the high throughput of biological data at the low expense. More important, comparing with the hydrate method, this technology are prevalently used in transcriptome analysis,which could gain the accuracy profile of cell transcriptome. Under the circumstance that we figure out the gene expressions by virtue of sequencing the mount of mRNA transcript from the genome, we always depend on the coverage, which are adopted by different assumption. As a result, we could have a lot of expressions for a gene in the experiment. So, there is necessary to clarify the characteristic of distribution of the reads mapping on the genome. In this thesis, we survey the bases content,error ratios of reads,and their distribution at different GC content genes,different GC content region of the same gene. We found that the error ratios increase quickly at the tail region of reads, and the higher GC content gene region has,the more abundant reads on it.After the human genome project finished, more and more species genome were acquired by sequencing. Analysising the gene structure, we found the introns distribution of the eukaryote genome are one or two peaks in graph, but there is always a peak between50-200bp. We called this part of intron which is between the length range,"minimal intron". According to the precedent research, these introns play an important role in evolution revealed by their conservative location among the species, and most of them belonged to housekeeping genes induct the alternative splicing. In conclusion that, the minimal intron is important to the eukaryotes, but the reason why introns maintaining this length distribution remained unknown. However, there is only a few analysis data, insertion and deletion ratio can also support the ideal that minimal intron inclined to keep their length distribution in2002. Now, we analyzed re-sequenced179individual genomes (Africans, Europeans, and Asians) from the data released by the1000Genome Project to study the mechanism. In our analysis of, we observe the mount of Indel (insertion and deletion) in intron is decreasing with the length increasing. The content of A and T flanking of Indel is fluctuating seriously in large Indel. In analysis of frequency of Indel, we observe intron try to keep the optimum length in high frequency again. Finally, we find genes containing minimal introns are more conservative for their function catalogs.
Keywords/Search Tags:RNA-seq, minimal intron, 1000genomes project, Indel
PDF Full Text Request
Related items