Font Size: a A A

Two Application Studies Of Variance Component Test In RNA - Seq Data

Posted on:2017-02-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:S YangFull Text:PDF
GTID:1100330485462681Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
With the wide availability and fast development of RNA-Seq (RNA sequencing) technology, it widely applies to discover the relationships between molecules and complex diseases. Because the mechanism of disease are complicating, emerging studies focus on the selection of differentially expressed (DE) RNA, but also the relationship between the molecules with the biological knowledge and expression profile. These two aspects perhaps the huge challenge in the biostatistics and systematic biology.In Part I, by the idea of set test, the model of selection DE mRNA from the isoform expression data is build. The model assumes the isoform data of one mRNA as the response variable which follows the Poisson and Negative Binomial (NB) distribution. The first random effect is the similarity of isoforms or the relationship between them; the second random effect is the label variable. Therefore, the test of a set of isoform transform to test of variance component test of the second random effect in the framework of generalized mixed effects model. Under the Ho, construct the score test statistics of the second random term, and the statistics approximately follows the mixed chi-square distribution. Also, the empirical distribution of the statistics is built by permutation method. Simulations indicate the statistical property of the theoretical and empirical distribution and the difference between traditional methods and this method. Real data is the mRNA sequencing data of lung squamous cell carcinoma (LUSC) downloaded from TCGA (The Cancer Genome Altas).The results of Part I are as following. This method almost control the Type I error, however, the empirical distribution severely controls it. The Type I error of traditional methods disperse in different extent. In the simulations of power, this method is prior to the traditional in the situations that the directions of labels are the same and that the directions of labels are different. The different distribution assumptions of RNA-Seq data perhaps slightly influence the results. The results of Poisson assumption are superior to the NB assumption. In the NB assumption, the power of traditional methods is low, but it of this method is higher. In real data analysis, this method specially identifies 17 DE mRNAs in which 3 mRNAs are in the Batch 101.In the Part II, based on the similarity between mixed effect model and kernel machine, the Garrote Kernel Model of first-order interaction between mRNA and miRNA is built in the framework of Binary phenotype (BGMK). Then the test of interaction transfer to the test of the variance component of the random effect in the mixed model. Under the H0, the score test statistic of the garrote parameter, following mixed chi-square distribution, is constructed. Simulations compare the statistical property of this method to that of traditional F-test. Real data analysis uses the datasets of biological relationship between the two molecules and the sequencing data of breast invasive carcinoma (BRCA) from TCGA. In the terms of biology and statistics, we identify some interactions.The results of the Part II are as follows. BGMK strictly control the Type I error in some parameter settings. However, the Type I error of F-test severely disperse in many settings, therefore it is unsuitable to analyze the high-dimension interaction data. Power simulations indicate that the big difference between the model without interactions and the whole model may cause the high power. The linear and nonlinear assumptions of non-parameter in the model are mostly no relation to the power, which may suggests that the model is suitable to detect the relationship between the complex molecular interactions and disease. In real data analysis, BGKM identifies 13710 mRNA-miRNA interactions.
Keywords/Search Tags:mixed effect model, variance component test, differentially expressed, interaction
PDF Full Text Request
Related items