Font Size: a A A

Construction And Sequence Analysis Of Wheat Full-length CDNA Libraries

Posted on:2007-02-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:G Y ZhaoFull Text:PDF
GTID:1103360185955463Subject:Biochemistry and Molecular Biology
Abstract/Summary:PDF Full Text Request
Hexaploid wheat (2n=6x=42,Triticum aestivum L.) is one of the world's cornerstone crops, feeds more people than any other crop, and is the most widely adapted of the major crops, thus offering potential for increased food production. Technical complexities in studying the wheat genome include that it is an allohexaploid composed of 16,000 Mb of DNA, 40 times the size of the rice (Oryza sativa L.) genome and contain too many redundant repetitive sequences (about 90% ) .So the wheat genomics research lag behind model plant such as rice (oryza sativa) and Arabidopsis thaliana which have small genome. In these cases it is very important to clone some genes such as diseases resistance gene, pest resistance genes, abiotic tolerance genes and important agronomic trait gene or QTL such as yield and so on. Recently construction and sequencing of Full-length cDNA libraries are very important aspect of plant functional genomics. As most of clones from full-length cDNA library contained not only the CDS but also the UTRs of 5' and 3' regions, which facilitate the progress of later researches such as sequencing, structural and functional analysis, the full-length cDNA library is regarded as an effective approach to discover new genes in large scale.Based on the published Cap-trapper method, which is an important construction method of full-length cDNA library, we modified some of its steps and built our own full-length cDNA construction platform. Using the modified Cap-trapper method, we have constructed ten full-length cDNA libraries from Triticeae tribe. These full-length cDNA libraries were derived from different species which were common elite wheat variety Yanzhan No 1 and Chinese Spring, an international recognized basic research genotype for wheat, and wheat's A, B, D genome donors T. urartu, Ae. speltoides ssp speltoide and Ae. tauchii ssp strangulata. And these libraries were derived from diverse tissues, stages, and treatments, including root, shoot, young seedling, spikelet, anthers, callus, young embryo and endosperm, induciblely inoculated by wheat powder mildew pathogen. Libraries check results indicated that the independent clone number ranged from 6.0X 1055.0× 106, the insert size were about 1.5Kb. The titer of amplification libraries was about 1010pfu/ml grade. So ten high quality full-length cDNA libraries were obtained in our lab and could be used for other important experiment such as clone sequencing and gene screening and so on. The libraries inserts were ligated into Stratagene Unizap XR vector and exist in phage mode.To obtain as many as possible full-length cDNA sequences, we use an effective strategy as follows: Approximately 100,000 3' EST sequences were produced from the 10 full-length cDNA libraries by use of T7 primer and assembled into representative sequences including contigs and singlets. Clones selected according to the representative sequences were sequenced from 5 end by using of T3 primer.Sequencing was done on an ABI Prism 3730 XL sequencer and sequence and quality files from trace files were read by the phred program using a quality score setting of 20. This allowed the definition of a quality read length (QRL) of <1 errors/100 bases. From these QRLs, high-quality sequences data were extracted after removing vector ends and primer linker and short sequences and then filtered for sequences from E. coli, plastids, repetitive sequences and other sequences anomalies. Also removed were sequences <100 bp. Throughout the project, 30,586 5'EST and 95,736 3'EST were obtained, that is to say, total of 126,322 EST were generated. The total length of 7.46×107 bp ofsequences was generated from our wheat Full-length cDNA project. To determine the unique nature of sequences within the collections, assembly algorithms were applied to the sequence pools using CAP3 algorithm. Assemblies were performed on all sequences advanced through the sequence cleaning process. CAP3 parameter (-o 50 -p 95, other default) was set to allow like-sequences with 90% identity over a lOObp length to form contig clusters. In addition, assemblies were performed on the cDNA libraries individually to assess the level of library redundancy. At last 32,899 representative sequences including contigs and singlets were generated and could be analyzed by other research.Genes GC content analysis results indicated wheat expressed gene had a GC content of 53.99%. The results were consistent with that of rice. But the GC content of 5'EST was 57.80% and great higher than that of 3'EST (52.78)%.All the 32,899 representative sequences were compared to the NCBI 879,995 wheat EST, 1,191,102 Rice EST , 32,127 Rice full-length cDNA sequences and BGI rice genome sequences using local BLASTn program. The blastn results indicated that 8,800 unique sequences (26.75%) were new against wheat EST (E value le-20);15,992 unique sequences (48.6%) didn't occur in rice FLC sequences (E value le-5) and 18,672 unique sequences (56.75%) were new against NCBI rice EST (E value le-20);. 14,382 sequences (43.72%)had no hit against BGI rice genome sequences(E-value le-5).Codon usage bias is one of the characters of organism that was formed in the evolution process. Researches indicated that a lot of factors affected codon usage, such as the GC content, the length of genes, the abundance of tRNA, etc. We analyzed the codon usage of 760 genes from our Full-length cDNA sequence collection. The statistical results suggested that GC content 3rd position of synonymous codons is as same as the global GC content of wheat gene. In synonymous codons of wheat genes, there were 27 optional codons and the 3rd position base of these codons was G or C. It was the same as the case of rice but very different from Arabidopsis of which the 3rd position base of these codons was A or T.Each of 32,899 unique sequences was also searched against the UniProt database using blastX,and best matches(E-value
Keywords/Search Tags:wheat, full-length cDNA library, GC content, codon bias, GO annotation, bioinformatics
PDF Full Text Request
Related items