| Somatic embryogenesis is an efficient model system for studying cell totipotency and important method of large-scale propagation for trees.It has great application value in race amelioration,rapid propagation,germplasm storage,genetic transformation and other fields.But,embryogenic callus of Janpanese Larix frequently transformate to non-embryogenic callus during long-term subculture;some desirable genotypes have low initiation rates;numbers of abnormal embryo can be easily found during maturation stage; the capability of somatic embryogenesis system is unstable.Therefore,It,s an desirable way for resolving these problems above in a developing embryo to understand the mechanisms of gene regulation/expression during the somatic embryo developmental process..In this study,(1)we used embryogenic callus of cell line 638 of larix leptolepis as experimental materials,The full length cDNA library of Larix somatic embryo was firstly constructed through SMART method;(2)and we utilized 454 GS FLX Titanium sequencing platform to characterize the somatic embryo which is at the stage PEM transcriptome of Larch.The main results are as follows:1.The titers of the primary library and amplified library were 2.25×10~6pfu/mL and 2.65×10~9 pfu/mL,respectively.The recombination rate was 95.13%.The lengths of most cDNAs in the library ranged from 0.5kb to 2kb,and the average size of insert fragment is about 850bp.These results indicate that the library is qualified for cloning and expressing target genes.2.A total of 16 clones randomly chosen from the cDNA library were sequenced and these ESTs were analyzed.A set of 15 sequences were obtained.Clustering and assembly of these cDNA sequences resulted in 13 unigenes,including 2 contigs and 11 singletons.Among them,3 unigenes were predicted to have known functions,2 unigenes have putative function, 2 unigenes are unnamed or have unknown function and 6 unigenes have not high similarity sequences in the GenBank protein database.The cDNA library we constructed is a good source for cloning cDNA of rare mRNA and identifying functional genes from Larch somatic embryo. 3.The research on transcriptome of Larix somatic embryo showed:(1)A total of 591 759ESTs were obtained from 1/2 454 sequencing runs.De novo assembly yielded 32 321 contigs and 38 606 singletons,namely we got 70 792 ESTs.After filtering the ESTs which is shorter than 100bp,There still remained 65 115 high quality ESTs,which indicated that the 454 high throughput sequencing technology is an efficient method for getting massive ESTs.(2)Based on sequence similarity search with Uniprot protein and CDD database,a total of 31 077 ESTs were annotated.out of these annotated ESTs,6 569 ESTs were assigned to gene ontology . Based on the result of Uniprot and CDD analysis,ESTs were aligned to the COG database and were classified as 22 COG categories.Searching against the KEGG indicated that 6 281 ESTs were mapped to 291 KEGG pathways.we got 1 121 SSRs from our ESTs.(3)Summarizing the results of BLASTX with Arabidopsis thaliana,Oryza sativa,Populus trichocarpa and Physcomitrella patens protein databases,It indicated that the higher propotion of Populus genes that are apparent homologues to Larch more than to Oryza,Arabidopsis,Phythens.(4)Similarity searches were performed with the tblastx against PGI9.0 and SGI5.0 EST databases, Physcomitrella patens Unigenes, Arabidopsis thaliana and Oryza sativa Unigenes, Populus ESTs,the results revealed that data derived from angiosperms species alone were insufficient for annotating sequences in conifers.This study generated a subtantial fraction of Larch transcript sequences,which will provide a rich source to discover and identify new genes,characterize gene expression ,as well as for identification of gene markers scattered across the genome to be used in various applications.Additionally,these results also suggested that transcriptome annotation based on 454 sequencing technology is an efficient method for gene discovery and molecular marker development in non-model organism,especially those with large and complicated genome. |