Font Size: a A A

Statistical Methods For Analyzing Small RNA Sequencing Datasets Based On Mi RNA/isomiR Profilings

Posted on:2016-10-13Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhangFull Text:PDF
GTID:2180330461493298Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
The development of small RNA deep sequencing(sRNA-seq) technologies and the accumulation of publicly available databases has assisted the researchers to achieve more and more resources. How to deal with the amount of sRNA-seq datasets effectively concerns whether the technologies make sense for biomedical research, and certain bioinformatics methods and software came into being. However, recent sRNA-seq studies just adopted the descriptive statistics as the lack of samples, and their statistical methods had many limitations. Here, a computer simulation was conducted to evaluate the statistical properties and effectiveness of four algorithms on differential expression in analyzing sRNA-seq data. Meanwhile, we proposed a statistical analysis strategy from application point of view, with the help of bioinformatics and biostatistics, based on miRNA/isomiR profilings.In Section Ⅰ, We conducted a simulation based on negative binomial distribution and parameters derived from real data. The type Ⅰ error and statistical power of four feature selection methods were evaluated thoroughly:(1) baySeq, DESeq and permutation can indeed control the type Ⅰ error at the significant level of 0.05, but permutation failed after Bonferroni correction. The type Ⅰ error of baySeq was controlled extremely strictly, and the type Ⅰ error of edgeR slightly inflated in any situation.(2) If the mean difference between case and control increased, or if the size factor of case group decreased, the power of each algorithm increased with other parameters fixed. But the proportion of negative genes π0 was unrelated to the power. Additionally, the power of baySeq, DESeq and edgeR were superior to that of permutation under the same parameters.In addition, a statistical strategy, that was “data pre-processing ' differential expression analysis ' hierarchical clustering ' functional enrichment analysis”, was put forward in analyzing small RNA sequencing datasets(BRCA). According to the criterion of P-value≤0.05, the Differentially expressed(DE) miRNAs detected by the four algorithms were very different. Nevertheless, adding the criterion of |log2(FC)|≥2, their results tend to be consistent. There were 15 deregulated miRNAs in tumor tissues, compared to normal tissues. The putative target mRNAs of those DE miRNAs were significantly enriched in some cancer-related pathways. Moreover, the heatmap plot showed that the miRNAs with similar expression signatures grouped together and the tumor tissues were separated from normal tissues.In Section Ⅱ, “arm switching” and isomiR expression patterns were also explored, and a comparative study on samples’ classification performance of three features derived from isomiR profilings was conducted. As the sequencing data were extremely skewed and over-dispersed, we could recommend the nonparametric test for detecting the pre-miRNAs with “arm switching” in different tissues and/or conditions(cases and controls); and rank-based MANOVA should be used to comparing the isomiR expression patterns between normal and tumor samples. Multiple DE isomiRs performed well in samples’ classification, comparing to miRNA total or annotated one, hinting that researchers need focus on not only differential expression of miRNAs but also the variability of isomiR profilings. In conclusion, three suggestions can be made:1. DESeq algorithm are able to maintain a reasonable false-positive rate without any loss of power, and DESeq is applicable to differential expression analysis in miRNA deep sequencing research.2. The strategy “data pre-processing ' differential expression analysis ' hierarchical clustering ' functional enrichment analysis” is appropriate to analyze the deep sequencing data.3. Rank-based MANOVA method is an effective one for comparing isomiR expression patterns in diverse tissues or between two conditions, aware of the variability of isomiR profilings.The above strategies and methods answered the biologists’ questions in analyzing small RNA sequencing research quite well and deserved to be explored widely in future work.
Keywords/Search Tags:sRNA-seq datasets, differential expression, statistical analysis strategies, “arm switching”, discriminant analysis, isomiR expression patterns
PDF Full Text Request
Related items