Font Size: a A A

A Normalization Method Based On Variance And Median Adjustment For RNA-Seq Data And Its Evaluation

Posted on:2015-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2250330428460073Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the next generation sequencing, the RNA sequencing(RNA-Seq) is widely used in the analysis of transcriptomes of various organisms. As the different sequencing libraries are gained by the different sequencing lanes and the sequencing depths are different, these different libraries can not be directly compared. Therefore, these sequencing libraries must be normalized to adjust the total number of the different sequencing lanes to eliminate the errors of sequencing technology in the experimental process and to enable more accurate analysis of differentially expressed genes.The paper proposes the minimum variance and median normalization method, which is based on variance and median adjustment to normalize the RNA-Seq datasets. This method not only considers the global expression level of all genes in the overall library, but also considers the impact of each individual gene’s expression. This method is used to analyze the Arabidopsis polyadenylation[ploy(A)] and gene datasets. First, the geometric mean variance based on geometric mean method is calculated, and the mean variance is calculated based on the weighted trimmed. Then these two variances are synthesized for each sample to obtain an optimal variance. Finally median adjustment is performed for all the samples from the sequencing libraries after variance adjustment to achieve the normalization of datasets.In this paper, the minimum variance and median normalization method is evaluated based on the data distribution of different samples, the empirical statistical metrics[mean square error(MSE) and Kolmogorov-Smirnov(K-S) statistic], differential expression analysis and so on. The minimum variance and median normalization method is also compared synthetically with the two exiting normalization methods, DESeq(Differential Expression Sequence) and TMM(Trimmed Mean of M Values). The experimental results show that the minimum variance and median normalization method can effectively normalize the RNA high-throughput datasets under different conditions, make each sample of the normalized sequencing library have the same data distribution, adjust the all sequencing samples to the same level, and reduce the overall expression difference of genes and poly(A) sites in different samples from sequencing library.
Keywords/Search Tags:RNA Sequencing, Normalization Method, Evaluate
PDF Full Text Request
Related items