| BackgroundProstate cancer is the second most common cancer and the fifth leading cause of cancer death in men.There were nearly 1.4 million new cases and more than 370,000 deaths globally in 2020.With socio-economic development,increased life expectancy,and a progressively westernized lifestyle,the incidence of prostate cancer has increased significantly in China,and there is an advance in age.Therefore,we need to further explore the differential expression of genes in prostate cancer,and to conduct in-depth research on the carcinogenesis mechanism.Traditional experimental methods hardly meet the current research needs due to their long experimental cycle and high cost.Single-cell RNA-sequencing allows quantitative measurement and comparison of gene expression at single-cell resolution,providing an opportunity to study distinct gene expression patterns in seemingly homogeneous cell populations in cancer and para-cancerous tissues.However,traditional bioinformatics analysis methods can only perform association analysis between differential genes and exposures,so it is difficult to distinguish the perturbation of transcriptional regulatory relationship from true biological variation.Targeted Maximum Likelihood Estimation(TMLE)model is a doublerobust and semi-parametric estimation method based on maximum likelihood estimation for causal association analysis.It consists of two-stages estimation.First,establish an adaptive combination model by the super learner algorithm.Next,target the correction of the combination model to achieve a local optimum,and establish an optimal counterfactual prediction model to estimate the average causal effect.The method can adapt to zero-inflated and high-dimensional single-cell sequencing data,and can be used to screen causal genes of prostate cancer.ObjectiveWith the advantages of single-cell RNA sequencing data,the causal association analysis method was applied to identify the causal effect of genes on the carcinogenesis of prostate cancer in different cell mass.This study aims to provide reference for experimental research to verify differentially expressed genes and pathogenic mechanisms of prostate cancer,and to develop targeted therapy drugs.MethodsIn this study,single-cell RNA sequencing was performed on cancer tissues and paracancerous tissues from 3 prostate cancer patients using 10X Genomics,and the expression levels of 26392 genes were obtained in 56893 cells.A gene expression matrix was constructed with cells as the sample unit,and genes as variables.The cells derived from cancer tissue are defined as the "case" group,and the cells derived from para-carcinoma tissue are defined as the"control" group.Based on the big data matrix above,routine bioinformatics analysis was performed firstly,single-cell RNA sequencing data was integrated,cell clustering and cell annotation were performed after preprocessing to remove batch effects,and KEGG enrichment analysis was used to locate key cell masses of prostate cancer.And the SCDE method was used to analyze the differentially expressed genes between cells derived from cancer tissues and paracancer tissues in these cell masses.TMLE model was further used to screen for genes withcausal effects on prostate cancer.For potential causal genes,KEGG enrichment analysis was used to search for signal pathways that may affect the carcinogenesis of prostate cancer and explore the pathogenesis.Results1.Analysis of differentially expressed genes using bioinformatics methods:① Twenty cell subsets were identified and cell types were annotated.Then,three key cell mass of prostate carcinogenesis were located by KEGG enrichment analysis:type 1 prostate luminal epithelial cell mass,type 2 prostate luminal epithelial cell mass and CD8+T cell mass.② SCDE method was conducted to identify the differentially expressed genes between cells derived from cancer tissues and cells from adjacent tissues in 3 cell masses.119 differentially expressed genes were found in the type 1 prostate luminal epithelial cell mass,and 40 differentially expressed genes were found in the type 2 prostate luminal epithelial cell mass,68 differentially expressed genes were found in the CD8+T cell mass.2.Targeted maximum likelihood estimation model was used to screen for genes with causal effects on the carcinogenesis.66 potential causal genes were found in the type 1 prostate luminal epithelial cell mass,32 potential causal genes were found in the type 2 prostate luminal epithelial cell mass,and 47 potential causal genes were found in the CD8+T cell mass.Among these three cell masses,the shared potential causal gene was CRISP3,and the average causal effect was relatively large.Type 1 prostate luminal epithelial cell mass and type 2 prostate luminal epithelial cell mass shared 7 potential causal genes,5 of which were definitive prostate cancer marker genes.3.Enrichment analysis was performed based on potential causal differential expression.Potential causal genes of type 1 prostate luminal epithelial cell mass were enriched in 6 cancerrelated pathways,including arginine and proline metabolism,protein processing in endoplasmic reticulum,neurotrophic signaling pathway,mineral absorption,PPAR signaling pathway,PDL1 expression and PD-1 checkpoint pathway.The potential causal genes of type 2 prostate luminal epithelial cells were enriched in 4 cancer-related pathways including focal adhesion,MAPK signaling pathway,PI3K-Akt signaling pathway and ECM-receptor interaction pathway.Potential causal genes of CD8+T cell mass were enriched in 3 cancer-related pathways including antigen processing and presentation,T cell receptor signaling pathway and IL-17 signaling pathway.Conclusions1.Conventional bioinformatics analysis methods can only obtain the correlation analysis between differentially expressed genes and exposures,and the number of differentially expressed genes obtained is large,which is not conducive to experimental verification.Based on the conventional bioinformatics analysis method,this study further used the TMLE model to screen genes with potential causal effects on the carcinogenesis of prostate cancer,effectively focusing on the key oncogenes of prostate cancer and narrowing the target range for experimental research.Therefore,it can help reduce the experimental cost and period,improve the success rate of translation into clinical research,and provide a method reference for causal association analysis research in genomics.2.This study identified two genes(VGLL3 and Lnc-HPS3-3)that may be important for prostate cancer development by analyzing potential causal genes shared between cell mass.It provides a reference for further experimental research on the pathogenic mechanism of prostate cancer and the development of new therapeutic targets.3.This study found that 18 potential causal genes of the type 1 prostate luminal epithelial cell mass were enriched in pathways,of which 15 genes have been confirmed to be associated with cancer,and these genes may lead to carcinogenesis by altering metabolic characteristics.6 potential causal genes of type 2 prostate luminal epithelial cell mass were enriched in pathways,3 of which have been confirmed to be associated with cancer,and these genes may promote cancer cell migration.16 potential causal genes of CD8+T cell mass were enriched in pathways,of which 8 genes have been confirmed to be associated with cancer and may assist cancer cells in immune escape.This study is of great significance for the refinement of the pathological mechanism of prostate cancer and the development of precise treatment. |