| As one of the next-generation sequencing technologies,transcriptome or RNA sequencing(RNA-seq)has been widely used for gene differential expression analyses and gene annotation researches in many species.A variety of software packages for RNA-seq data analyses are available.However,the practical analysis involves several complicated steps of and more parameters,so that it is difficult for most researchers to perform such an analysis by themselves.In order to use these software packages smoothly,we developed an integrated package with Perl language.The perl package was used to analyze the RNA-seq data of Populus simonii and P.deltoides under the normal and drought stress conditions with or without a reference genome,and the corresponding results were compared.The main results in this study were obtained as following.(1)For the general RNA-seq data,an integrated software package called findDEG was developed with Perl language by incorporating packages such as Trinity,Cufflinks and StringTie.The new software can analyze RNA-seq data by considering different methods for computing gene expression abundance and hypothesis testing of gene differential expression.Meanwhile,other issues were also considered,including whether a reference genome is available,whether the sampling is repetitive or not,and whether the data is paired or single end.FindDEG can be downloaded at the website: http://www.bioseqdata.com/findDEG/findDEG.htm.(2)In the genome-based RNA-seq data analysis process,the reads of P.simonii and P.deltoides under two conditions were mapped to the reference genome of P.trichocarpa separately.As a result,more than 70% of the reads were aligned to the genome.We assembled the four sets of reads from P.simonii and P.deltoides under the two conditions separately.In P.simonii,a total of 35,886 transcripts and 28,002 genes were generated in the normal condition,and 36,591 transcripts and 28,825 genes in the drought stress condition.In P.deltoides,a total of 38,678 transcripts and 29,178 genes were obtained in the normal condition,and 41,415 transcripts and 30,693 genes in the drought stress condition.Using the old version of Cufflinks(v2.1.1),we found 33 differentially expressed genes(DEGs)between the normal condition and drought stress in P.simonii,and 28 DEGs in P.deltoides.In comparison,we identified 53 DEGs under the normal condition and drought stressed in P.simonii,and 28 DEGs in P.deltoides using the new version of Cufflinks(v2.2.1).(3)In the genome-free RNA-seq data analysis process,we de novo assembled genes and treanscripts of P.simonii and P.deltoides.In P.simonii,a total of 138,936 genes were generated with N50 length of 1,336 bp,and 231,139 transcripts with N50 length of 1,737 bp.In P.deltoides,a total of 109,116 genes were generated with N50 length of 1,582 bp,and 227,490 transcripts with N50 length of 1,862 bp.We identified 1,641 DEGs and 2,015 differentially expressed transcripts between the normal condition and drought stressed in P.simonii,and 1,752 DEGs and 2,096 differentially expressed transcripts in P.deltoides.(4)Those DEGs were also searched against GO database for functional annotations.DEGs identified from P.simonii and P.deltoides were annotated to the GO items related to drought stress.In summary,the new integrated software package developed in this study for RNA-seq analysis provided a powerful tool with multiple functions for most researchers.The practical RNA-seq data analysis in Populus showed that the analysis results of the same data were quite different by using different analytical strategies.It is recommended to try different analytical strategies with find DEG and choose the best result to describe the mechanics of gene expression in the studied organism. |