Font Size: a A A

Exploration And Application Of Gene Semantic Similarity And Cell Semantic Similarity Based On Single-cell Data

Posted on:2024-07-11Degree:MasterType:Thesis
Country:ChinaCandidate:P F GuoFull Text:PDF
GTID:2530306926987309Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
Background and purpose:Semantic similarity measures similarity for two objects based on their meaning.There exists a large amount of knowledge graph data in biology,we can use semantic similarity to evaluate intermolecular relationships.Through gene ontology and gene annotation information,we can calculate semantic similarity values for genes or gene products.Through cell ontology we can evaluate similarity between cells.However,in the current field of single-cell analysis,there still don’t have enough works about the application scenarios of semantic similarity.Firstly,considering the advantages of TCSS(Topological Clustering Semantic Similarity)method in evaluating protein interactions,we add it into GOSemSim package,which is a tool to calculate semantic similarity based on gene ontology and has been cited by many people and widely used.Based on GOSemSim,we developed the COSemSim package,to numerically compare the semantic similarity of cells based on the cell ontology,and investigate the relationship between cell clusters.In addition,we further explored the effect of gene semantic similarity on single-cell pseudotime analysis.And we found that combining semantic similarity with single-cell gene expression information could improve the prediction of cell development.Results and conclusions:1.First,we have improved TCSS method and added it to the GOSemSim package.By this,we expanded the algorithm richness and application scenarios of the GOSemSim package.TCSS algorithm reduced the influence brought by the unbalance of ontology structure through splitting the whole gene ontology structure to sub-graphs,and got better prediction result for protein-protein interactions.We added a cutoff calculation step for TCSS method to determine the sub-graph size more reasonably.And after we obtained actual data from some protein-protein interaction databases and compared the prediction results of TCSS method with other semantic similarity methods.All results we obtained show that the TCSS method with best cutoff is better than the TCSS method with non-best cutoff,and both are better than Resnik’s method.2.We developed a package named COSemSim to calculate semantic similarity between cells or cell clusters based on cell ontology data.We use the offspring information richness of cell types in the ontology structure to represent their information content,and called the calculate functions contained in the GOSemSim package to calculate semantic similarity.First,we calculated the semantic similarity of the cell types in hematopoietic stem cell lineages and confirmed that the cell ontology reflects cell types’ lineage and function.After that,we calculated the semantic similarity and expression similarity of seven single-cell data sets in the SingleR package,then we found the correlation between semantic similarity and expression similarity is about 0.5,and then we compared their clustering results for cells.The results revealed that the clustering result based on semantic similarity is better than that of expressing similarity.Finally,we used the cell semantic similarity to calculate the relationship between cell clusters and found that it was consistent with the real cluster characteristics.3.We explored the application of semantic similarity between genes to single cell pseudotime analysis.We combined genes’ semantic similarity and theirs expression information on all cells,and conducted pseudotime analysis by considering both the functional relationship and expression characteristics of genes.Then we compared the prediction results of pseudotime and developmental trajectory for two single-cell datasets.Compared with the original expression data,it was found that the prediction results from data combined with the gene semantic similarity were closer to the actual cell developmental characteristics,indicating that the semantic similarity between genes can help us understand the relationship between cell development and the true dynamic changes of cells.
Keywords/Search Tags:Semantic similarity, Gene ontology, Cell ontology, Pseudotime analysis
PDF Full Text Request
Related items