Font Size: a A A

Study And Application Of A Differential Expression Analysis Method Based On Relative Expression Orderings

Posted on:2022-08-11Degree:MasterType:Thesis
Country:ChinaCandidate:Q H ZengFull Text:PDF
GTID:2480306554977319Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Background: Identifying differentially expressed genes(DEGs)between two different conditions or phenotypes is the basic task of high-throughput gene expression profiling analysis.Previously,our laboratory developed the Rank Comp algorithm based on the relative expression orderings within samples(REOs),which can be used to identify DEGs at the individual and population levels,and it is insensitive to batch effects.In Rank Comp algorithm,the Fisher's exact test was used to calculate the significance level of the contingency table,and then to evaluate whether there is a significant correlation between the conditions or phenotypes and the distribution of stable gene pairs in the two groups.However,it ignores that REOs between gene pairs in two groups is a paired experimental design relationship,thus the Fisher's exact test is not applicable in this situation.Method: In this study,the Rank Comp algorithm was optimized,and a new improved algorithm called Rank Comp V3 was proposed in which the Mc Nemar-Bowker test was used to replace the Fisher's exact test.Furthermore,it was extended to the identification of DEGs in single cell transcriptomes.The performance of Rank Comp V3 algorithm was evaluated from various aspects and its application was explored,as followings:(1)The false positive rate of Rank Comp V3 was evaluated by using the Null datasets from multiple platforms,including gene chip,RNA sequencing(RNA-seq)and single cell RNA sequencing(sc RNA-seq).(2)Rank Comp V3 was applied to the benchmark datasets of gene chips and RNA-seq,namely MAQC and SEQC.Then the AUC value of the algorithm was evaluated by using the Taqman measurement results as the “gold standard”.(3)The true positive rate and true negative rate of Rank Comp V3 were calculated based on simulated sc RNA-seq datasets.(4)Rank Comp V3 was applied to the sc RNA-seq dataset GSE29087.The top 1,000 genes obtained from gene chip dataset as the “gold standard” were used to evaluate the precision and AUC value of the algorithm.(5)Rank Comp V3 was utilized in breast cancer gene chip dataset with weak differential expression signals,and the functions of the DEGs were analyzed.(6)Rank Comp V3 was used in the sc RNA-seq dataset of adamantinomatous craniopharyngioma and function enrichment analysis on the identified DEGs was performed.Results: The performance of Rank Comp V3 algorithm in identifying DEGs was evaluated from many perspectives based on different datasets.The results are as follows:(1)To evaluate the false positive rate of Rank Comp V3,DEGs were identified in each type of samples in the Null datasets.It was found that among the four types of samples in GSE54695 dataset,the highest false positive rate of Rank Comp V3 was less than 0.01%.In a previous study comparing multiple differential expression analysis methods,the false positive rate of all algorithms was higher than Rank Comp V3.Among them,the false positive rate of Monocle2 algorithm was the highest,reaching 7.32%.(2)The Taqman measurement results were used as the “gold standard” and the AUC value of Rank Comp V3 in the SEQC dataset was reached 0.94.Moreover,the performance of six differential expression analysis algorithms in the SEQC dataset were compared in a previous study,in which the Linnorm algorithm has the highest AUC value,but it is lower than Rank Comp V3.(3)For simulated datasets,the performance of Rank Comp V3 is better than a variety of differential expression analysis methods with higher precision and a very low false positive rate.(4)In the sc RNA-seq dataset GSE29087,the top 1,000 genes were used as the“gold standard” to evaluate the performance.A previous study found that although the true positive rate of various algorithms was higher than 0.700,its precision was low(the highest is only 0.091)due to the large number of DEGs recognized by the algorithms(all greater than 7,500).Rank Comp V3 has identified 587 true DEGs with a precision of 0.105,which is superior to other algorithms with higher true positive rate.(5)For the breast cancer chip dataset with weak differential expression signals,some common differential expression analysis methods could not identify or only identify a few differentially expressed genes,while the differentially expressed genes identified by Rank Comp V3 were enriched in cancer-related pathways.(6)Additionally,Rank Comp V3 was applied to the sc RNA-seq dataset of adamantinomatous craniopharyngioma and KEGG functional enrichment analysis was performed on the identified DEGs.The results showed that cancer-related pathways were significantly enriched.Innovation: Considering the REOs of gene pairs is a matched experimental design relationship in the control group and the experimental group,the Mc Nemar-Bowker test is used to identify DEGs,which could reduce the false positive rate of the algorithm.Moreover,compared with the Rank Comp algorithm,Rank Comp V3 is not only applicable to the gene chip and RNA-seq datasets,but also to the sc RNA-seq datasets.Conclusion: A novel DEGs recognition algorithm named Rank Comp V3 by using the Mc Nemar-Bowker test based on REOs was developed in this study.It turned out that Rank Comp V3 has good performances in gene chip,RNA-seq and sc RNA-seq datasets,and is insensitive to batch effects.
Keywords/Search Tags:transcriptome sequencing, gene chip, single-cell transcriptome sequencing, differentially expressed genes, the relative expression orderings
PDF Full Text Request
Related items