| Background:(1)Cell-cell interaction(CCI)refers to the communication between cells through signaling molecules and receptors,which mediates physiological processes within and between cells,and is crucial for the growth,development,and maintenance of the internal environment of the organism.Abnormal CCIs are closely related to the occurrence of diseases.(2)Compared to traditional RNA sequencing technology,single-cell RNA sequencing can obtain gene expression of individual cells in a sample and has been widely used in developmental biology,immunology research,etc.Single-cell high-throughput sequencing technology makes it possible to study potentially abundant CCIs between cells in a sample.By referring a pre-built "ligand-receptor" database to the single-cell RNA seq expression matrix,potential CCIs can be predicted.(3)As biological networks,CCI can be studied by network analysis with knowledge of bioinformatics.For example,module detection can discover the relational closeness between cells,while centrality analysis shows important cells or genes in the network.Purpose:This project used CCI prediction tools for single-cell data with a Neo4 j graph database to construct a map of CCIs in human lung,and investigated whether the database was useful for analyzing possible mechanisms in normal lung or lung diseases.In order to select the best prediction tool,this project first performed a benchmark of existing CCI prediction tools before constructing the database.Method:(1)Benchmark: Existing CCI prediction tools were collected and some of them were selected for the benchmark under criteria.CCIs associated with idiopathic pulmonary fibrosis(IPF)were collected from literatures as the gold standard for the benchmark.Single-cell data sets associated with IPF were collected from GEO and processed with R.The entire benchmarking workflow is written and run in a Jupyter Notebook: CCIs were predicted using all qualified tools,and predicted results were then converted to the "source-target-ligand-receptor" format.Predicted CCIs were combined with the gold standard as "all possible pair set " to obtain the precision,sensitivity and specificity for each tool.Time consumption of each tool was also recorded.(2)Construction of Neo4 j database: Single-cell RNA sequencing datasets related to human healthy lung or lung diseases were collected from GEO and processed for prediction of CCIs using the best-performing tools in the benchmark.Predictions were then stored in the Neo4 j graphical database and the database was subjected to module identification(Louvain algorithm)and centrality analysis(degree and Page Rank algorithm)using the GDS plugin for Neo4 j.Result:(1)Benchmark: After identifying all available and functional tools,the following ones were finally benchmarked: Cell Chat,Cell Phone DB,CCInx,i TALK,NATMI,Single Cell Signal R and sc MLnet.Time consumption of sc MLnet and Cell Phone DB was significantly higher than other tools.The results of performance metrics was: Top3 in accuracy: Cell Phone DB,NATMI and sc MLnet;top 3 in sensitivity: i TALK,Cell Phone DB and NATMI;top 3 in specificity: Cell Phone DB,Cell Chat and sc MLnet;top 3 in F1 score: Cell Phone DB,NATMI,and sc MLnet;top 3 in Mathews correlation coefficient: Cell Phone DB,NATMI,and i TALK.(2)Construction of Neo4 j database: Single-cell sequencing RNA data for healthy lung,COVID-19,IPF,non-small cell lung cancer and so on were collected.CCIs were predicted using Cell Phone DB.Using Louvain algorithm,most of samples were divided into two major modules corresponding to immune cell-dominant and non-immune cell-dominant,while some of the immune cells in the tumor samples were in the same module as the non-immune cells due to an increased number of interactions.For the centrality analysis of the IPF dataset,Page Rank scores of AT1 cells in disease group were increased.In the analysis of a subnetwork focused on AT1 cells,Page Rank scores of TGFB1,TNF,and TNFSF10 in disease group were found to be elevated,also CCIs of TGFB1,MIF to EGFR were increased too in this subnetwork.Results above indicated that these genes could be related to pathogenesis or therapeutic strategy of IPF,which were proved with prior studies.Conclusion:Through the benchmark,Cell Phone DB had the best overall performance,but it is still recommended for researchers to choose the most suitable prediction tool according to different research needs and conditions.The predictive map of CCIs in the Neo4 j graph database for human lung confirmed the feasibility,simplicity and efficiency of graph databases in CCI research,revealing its benefit of discovering pathogenesis or therapeutic target of diseases. |