| Objective:Widely-used gene set enrichment analysis methods to process and analyze data sets with heterogeneity and sample/patient specificity will introduce system errors during the process of data analysis,therefore,in recent years many researchers have designed and developed a series of single-sample gene set analysis methods for identifying pathway activity in a single or heterogeneous sample.For biologists unfamiliar with single-sample gene set analysis methods,it is an important question about how to choose the most appropriate analytical method from these existing alternatives.In current types of research,there is no assessment and comparison study of the existing single-sample gene set analysis method tools.So,the purpose of this study is to compare the sensitivity,specificity and precision of the selected 6 single-sample gene set analysis methods based on the theory of gene set analysis methods,and then to provide new ideas for the selection of data analysis methods.Method:Eight data sets related to respiratory diseases were retrieved in the GEO database as a "golden standard" test data set in the baseline assessment study,and the selected disease-related data must have a known biological function note of the gene set(signal pathway)that can serve as a reference target pathway(Target Pathway).Combined with the results of the biological study,the statistically significant gene sets in the results of the six single-sample gene set analysis methods were compared with the conformity of known biological research evidence(target pathways),and then the sensitivity,specificity and precision were calculated,and the objective evaluation was made.To make benchmarking easy to record,utilize,and circulate,the Jupyter Notebook recording process is used,and the dynamic Web Shiny tool allows other researchers to compare analysis and selection methods.Select the best gene set analysis method to compare the two experimental data sets(COPD-related data sets GSE36221 and COVID19 related data sets GSE147507).Result:In benchmark studies,the sensitivity and precision of GRAPE and Pathifie r were prior to other tools,while GSVA and ZSCORE performed better in ter ms of specificity.However,Pathifier’s calculation time is too long,so we consi dered GRAPE performs best when comparing these six methods.The entire wo rkflow of this benchmark study and a web page using the web page called "ss-shiny" has been published and available on the web page: https://gsa-central.git hub.io/benchmarKING.In the case study,GRAPE is to be explored further in practical application,its analysis results can accurately detect disease-related pathways to a certain extent,and complement with other analysis methods.Conclusion:Performance of single-sample GSA is poorly understood.Our benchmark shows that the top methods for gene set precision were different from the top ones in terms of sensitivity.The best overall methods were Pathifier and GRAPE,which agrees with a benchmark study that appeared during the development of this thesis.We have contributed to make benchmarking of GSA methods a much easier task by creating bioinformatics tools that make benchmarking easier,basically jupyter notebooks and shiny apps(ss-shiny).The later case analysis shows that the single sample gene set analysis method is not the only choice to do analysis.The best performing method in our benchmark(GRAPE)offers complementary results to a traditional web platform(Enrichr)when applied to a complex COPD dataset.GRAPE also offer complementary results for the analysis of COVID-19 datasets. |