| Soybean genome sequencing has been completed and released. Full-length cDNA library provide genomic sequence information more precisely. It will also help promote the effective use of soybean genome data and the functional genomics research and development, so full-length cDNA library has become an important approach in molecular biology. In this study, soybean (Glycine max L.) cultivar Suinong14 was used to construct 4 high qulity full-length libraries by SMART method, and obtained 2,071 full-length genes, we also annotated the gene structure, function and metabolic pathways, also the codon usage bias in soybean genomic genes was researched. The gene diversity of Glyma13g21630 identified from the cDNA library were analyzed between G. max and G. soja. These results improved correct annotation of soybean genome and research of soybean transcriptome. It could also provide the basis of codon modification in transgenic research, and provide some theoretical background for molecular domestication in soybean.The main results were as follows:1. Constrution high quality full-length cDNA library from soybean cultivar Suinong14The 4 high quality Suinong14 full-length cDNA libraries were constructed by SMART method from normal leaf, SMV infected leaf, seeds and mixed tissues (the former 3 libraries were accomplished by others in our lab). The recombination ratio of the cDNA library from the mixed tissues was 99.56%, while library capacity was 1.2×106, the efficiency of library was 3.1×107 pfu/ml. We picked 2,064 clones and obtained 1,949 sequences with total length of 2,348,703bp. A total of 1,031 sequences were obtained by bi-directional sequencing, remained 1,698 effective sequences after removing vector sequence. Data integration with formers in our lab, we obtained a total of 8,818 EST sequences consisting of 7,356 5'ESTs and 462 3'ESTs.2. Structure and function annotation of the 2,071 full-length transcripts from soybean cultivar Suinong142,071 full-length genes with average length of 595.14bp were identified from Suinong14 full-length cDNA libraries, of which, 1512 sequences were longer than 500bp (73%). The Results of gene structure analysis and statistical analysis of coding regions showed us: Leu, Ser and Arg were higher than the other plants such as Arabidopsis, rice, soybean has the highest GC content in 3rd base which is different from other organisms. Full-length genes were mapped to 20 chromosomes in soybean. The full-length cDNA sequences were classified by GO function based on the homology of gene sequences, and COG function based on the homology of protein sequences. The figures of KEGG metabolic pathway were drawn with soybean genome data as reference.3. Analysis of the codon usage bias in soybean nuclear genomeA total of 46,430 high confidence coding sequences and 2,071 full-length transcripts in Phytozme (soybean genome database) were used to analyze the composition and characteristics of soybean nuclear gene codons. CodonW software was applied to calculate the nucleotide composition, relative synonymous codon usage and other parameters of soybean genome and transcriptome. The result indicated that gene expression level was significantly and positively correlated with G+C and GC3s contents, and genes with high G+C and GC3s contents have high codon preference. UCC and GCC were identified as optimal codons in soybean. Analysis of coding sequences with different lengths showed that codon preference reduced along with the increasing coding sequence (CDS) lengths, and longer CDS tend to select codons randomly. CDS length between 400 to 600 bp had the highest expression level in soybean transcriptome data. The leaf-specific and seed-specific genes was same closed to the preference and expression level. But seed-specific genes was significantly higher G+C and GC3s contents than leaf-specific genes, and the contents of aromatic amino acids encoded by seed-specific genes were highly significant lower than these leaf-specific genes.4. Analysis of Glyma13g21630 gene diversity in cultivated (G. max) and wild (G. soja) soybeansA total of 29 polymorphism sites were identified in the 133 soybean cultivars, including 49 wild soybean and 84 cultivars (including 46 landraces and 38 cultivars), which included 22 SNPs and 7 InDels with frequencies of 1SNP/138 bp and 1InDel/434 bp, respectively. The rich regions of nucleotide variation were in 3rd intron and 5th intron and less variations in other regions. Haplotype analysis indicated that the number of polymorphic loci was reducing from wild soybean to cultivated soybean, and the distribution range was correspondingly narrowed. Linkage disequilibrium analysis demonstrated that 42.86% of SNP sites were significant linkage disequilibrium levels in wild soybean. The high ratio of Ka/Ks illustrated that some sites suffered strong positive selection pressure, which resulted in the reduction of polymorphism. The favored variation of Glyma13g21630 has been fixed in cultivated soybean, showing a bottleneck effect simultaneously. |