Font Size: a A A

Distribution Of Insertion And Deletion In Genome

Posted on:2009-07-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y H FanFull Text:PDF
GTID:2120360245451054Subject:Biochemistry and Molecular Biology
Abstract/Summary:PDF Full Text Request
The rates and patterns of indel (insertions and deletions) and substitution in rodent (mouse and rat) and 18 mammalian genomes have been studied. The result reveals that deletions occur more frequently than insertions, and single nucleotide insertion and deletion are the most frequent in all of the data. The deletion bias found in introns in mouse and rat supports the prediction that intron insertions are more deleterious than deletions because of reduced transcription and splicing efficiency. The frequencies of both deletions and insertions decrease rapidly with increasing indels length, and the size distributions of both insertions and deletions can be described well by power-law. The patterns of substitution suggest that both composition and GC content are not equilibrium in the introns in rodents. We found that the indel density between mouse and rat introns is about 0.014 indels per site and the observed indels are A and T-rich and over-represented in retrotransposons and tandem repeats. The base composition of indels is independent on the base composition of introns in rodents. The observed number of nucleotide G and dinucleotide CG are more than the expected number in single-base-pair indels and two-base-pair indels, respectively. Furthermore, the observed numbers of trinucleotides which contain CG are also more than the expected number in the three-base-pair indels. Those results suggest a CG bias in indels in the rodent introns. Deletions are relatively CG richer in general than insertions in mouse introns and insertions are relatively CG richer in general than deletions in rat introns, which suggests a relatively increase of GC content in the rat introns. 18894 insertions and 28051 deletions in mouse, and 16666 insertions and 40377 deletions in rat contain repeat elements. In the interspersed repeats (SINEs, LINEs, LTR elements, DNA elements), SINEs (Short INterspersed Elements) and LINEs (Long INterspersed Elements) occur most frequently, LTR (Long Terminal Repeat) elements occur less and DNA elements occur the least frequently. The interspersed repeats are about 33% and 35% of the total length of inserted sequences in rat and mouse, and 18% and 15% of the total length of deleted sequences in rat and mouse, respectively. Alu/B1 and B2-B4 in SINEs, LINE1 in LINEs, MaLRs and ERV_classâ…¡in LTR elements and MER1_type in DNA elements occurs the most frequently. We also confirmed that the majority of divergence between related sequences is due to indels.
Keywords/Search Tags:indel, substitution, base composition bias, indel distribution, density of indel
PDF Full Text Request
Related items