Font Size: a A A

Regulation And Polymorphism Analysis On Gene-associated Ssrs In Plant

Posted on:2011-01-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:L D ZhangFull Text:PDF
GTID:1100360305956845Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Simple Sequence Repeats (SSRs), as short tandem repeated sequences, are extremely common in plant genomes. SSRs are generally thought to originate from genomic repetitive DNA and regarded as"junk"DNA without any apparent function. With the advantage of genome sequencing, the recent investigation showed SSRs are preferentially associated with nonrepetitive DNA in plant genomes. They can be found abundantly within or near plant genes, and in particular, some types are significantly enriched within the 5'regulatory regions. It implies SSR within the regulatory regions may play vital roles in gene expression or function in plants. Thus, investigation of these over-represented SSRs will help to understand their function in gene regulation in plants.SSRs are significantly enriched in the regulatory regions of Arabidopsis genome, and this feature is mostly attributable to the over-representation of CT/GA and CTT/GAA repeats which account for about 60% of all SSR in the regions. Given these SSRs are important for regulating gene expression and they should be conserved in homologous promoters due to functional constraints during plant evolution. To address the question of SSR associated with gene regulation, we used inter- and intra-genomic phylogenetic footprinting to analyze the dominant SSRs in the 5'noncoding regions of Arabidopsis and Brassica oleracea genes for conserved noncoding SSRs, or conserved noncoding microsatellite sequences (CNMSs). We identified 247 Arabidopsis-Brassica orthologous and 122 Arabidopsis paralogous CNMSs, representing 491 CT/GA and CTT/GAA repeats, which accounted for 10.6% of these types located in the 500 bp regions upstream of coding sequences in the Arabidopsis genome. In order to ensure that the observation of CNMSs was not simply due to its over-representation in plant genomes, a similar analysis carried out based on three different random datasets, and it indicated that some SSRs in regulatory regions were conserved from common ancestors during plant evolution.To gain further insight into the evolutionary relationship of Arabidopsis-Brassica and Arabidopsis-Arabidopsis CNMSs, the synonymous substitution rate (Ks) was calculated for the corresponding gene pairs. The frequency distribution of Ks suggested that the Arabidopsis-Brassica orthologous CNMSs were conserved from a common ancestor over a 15 million years (Myr) period, while most of the paralogous CNMSs were originated from large scale gene duplication over 28 Myr ago and others were duplicated from the common ancestor of brassicaceae family over 42 Myr ago. The results from the evolutionary relationships of Arabidopsis-Brassica and Arabidopsis-Arabidopsis CNMSs suggested that most paralogous CNMSs pre-dated the divergence of the two species. Further comparisons of paralogous and orthologous genes from Arabidopsis and Brassica were made for common CNMSs. With the same criteria, we identified 18 Ultra-CNMSs found in Arabidopsis paralogous pairs that also were coincident with CNMSs from at least one orthologs in Brassica and many Ultra-CNMSs were conserved across a number of more distantly homologous genes in Brassicaceae species and other plants.Function annotations based on Gene Ontology showed that there were 206 Arabidopsis–Brassica and 194 Arabidopsis–Arabidopsis CNMS associated genes with known function and their function were significantly enriched for transcription factor activity and transcription regulation. These findings suggested that CNMSs might be specifically associated with regulation of transcription. Computational prediction of cis-acting elements revealed that CNMS (CT)n/(GA)n were similar to the known motif involved in light responsiveness and CNMS (CTT)n/(GAA)n were involved in salicylic acid responsiveness. The abundance of gene transcripts evaluated by the MPSS showed about 70%-80% of CNMS (CTT)n/(GAA)n associated genes in Arabidopsis leaves were regulated by salicylic acid. Seven CNMS (CTT)n/(GAA)n associated genes were additionally analyzed for expression patterns after salicylic acid treatment with RT-PCR. The results showed that expression of these investigated genes were consistent with the patterns of gene expression from the Arabidopsis MPSS database.In order to validate the CTT/GAA repeats as salicylic acid-responsive elements, four 5' deletions of the salicylic acid induced CTT repeat-containing AtHip1 promoter were fused to theβ-glucronidase (GUS) gene and introduced into Arabidopsis plants. The histochemcal assays of GUS activity and the expression level investigation of gus gene by real-time PCR on promoter transformant plants revealed that the AtHip1 promoter from -399 to -184 region (216 bp) relative to transcription start site is core promoter for gene transcription regulation. Deletion of this region led to the AtHip1 promoter lacking the salicylic acid-responsive function. Bioinformatics analysis revealed there was no known salicylic acid induced elements but the CTT repeated element in this region. Taken together, these results demonstrate that the CTT tandem repeated sequences within 5'regions as cis-acting elements play important roles in the salicylic acid regulation.Expressed Sequence Tag (EST) derived SSRs as genetic markers are specific associated with gene expression and fucntion. The large number of ESTs in databases is a valuable resource to develop SSR markers. EST databases may contain redundancy in sequences of a particular gene, such as different alleles derived from heterozygous individuals or from different genotypes. Some redundant ESTs can contain information on length-polymorphisms in SSRs. We developed an in silico tool for identification of polymorphic SSRs based on EST sequence redundancy. Using this tool, we identified 15,640 polymorphic EST-derived SSRs from maize, soybean, rice, wheat, rape, barley, cotton, tomato, potato and sorghum. The percentage of polymorphic SSRs ranged from 0.7% for tomato to 2.61% for maize. The EST-derived SSRs mainly consist of dinucleotide and trinucleotide repeats, accounting for 84% in all identified polymorphic EST-SSRs. Length polymorphism of all identified 15,640 EST-SSRs revealed a mutational bias of EST-SSRs that alleles tend to increase in size.EST-SSRs are derived from transcripts. Homologous analysis on indentified 15,640 plolymorphic EST-SSRs indicated the in silico EST-SSRs had a high level of transferability across crop species and the percentage of transferability ranged from 14.1% for rape to 45.9% for grass species such as sorghum (45.9%), wheat (39.1%) and barley (38.2%). Large-scale identification of polymorphic EST-SSRs by in silico approach greatly improves the efficiency of marker development. It is practicable to develop new molecular markers based on EST-SSRs transferability for those poor informative plants.Each of unique ESTs with polymorphic SSRs was searched against the uniprot/swiss-prot database by BLAST and the assigned uniprot/swiss-prot IDs were classified according to the GO terms using Plant GO-Slims into categories. The results showed that 8,952 out of 14,084 unique ESTs were associated with 108,601 GO annotations. Functional categories revealed ESTs with polymorphic SSRs were mainly involved in biological process such as protein metabolism, transport, transcription, response to stresses, developmental processes and signal transduction, while their molecular functions were preferentially associated with protein binding, DNA or RNA binding, hydrolase activity and transferase activity.To facilitate access this resource of polymorphic EST-SSRs from crops, we developed a database providing the detailed information of these EST-SSRs such as SSR motif, allele length, cultivar, gene function and primers. The database also provided a viewing of EST assembly and a homologous analysis of SSR-containing ESTs among the related species by BLAST. The online service of EST-SSR database was implemented in Perl + MySQL, and the data is available for download.
Keywords/Search Tags:simple sequence repeats, cis-acting elements, expressed sequence tags, EST-SSRs, bioinformatic database
PDF Full Text Request
Related items