Font Size: a A A

Research On Density Peaks Clustering Algorithm Based On Tumor Gene Expression Data

Posted on:2022-09-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y F JiangFull Text:PDF
GTID:2504306341986969Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of biotechnology,the gradually mature gene chip technology produces more and more gene expression data.Among them,the cluster correlation analysis of tumor gene expression data to determine the cancer subtype category has become a hot research topic at home and abroad.Research from the molecular level of tumor gene expression data and analysis of related pathogenic genes can distinguish different subtypes of tumor cells.Due to the characteristics of gene expression and the high cost of gene chip technology,gene expression data generally have problems such as high dimensionality,few samples,and complex structure.Therefore,the search for a clustering algorithm suitable for tumor gene expression data has become the focus of research.The density peak clustering algorithm(Density Peak Clustering,DPC)algorithm has simple overall ideas,fewer parameters,and high clustering accuracy.It has attracted the attention of domestic and foreign researchers.It is applied to tumor gene expression data to determine cancer subtypes.Higher research value and significance.In this thesis,focusing on the characteristics of tumor gene expression data,the corresponding improvement of the DPC algorithm is used as the overall research direction of the thesis,and the improved algorithm is applied to the tumor gene expression data set to determine the different subtypes of cancer patients.The main research contents are as follows:(1)In view of the problem that the key parameters of the traditional DPC algorithm need to be manually selected and the non-cluster center point label allocation strategy does not consider the correlation between data points,a density peak clustering algorithm combining KNN and graph label propagation(Density Peak Clustering Algorithm Combined with KNN and Label Propagation,DPC-NNLP)is proposed.This algorithm mainly combines the idea of KNN algorithm to calculate the local density value of each sample data point,and constructs the local density backbone area through the nearest neighbors formed by the KNN algorithm.Finally,the density-based KNN map is used to assign the labels of the data points in the known backbone area to the remaining points to form the final cluster.And apply the algorithm to a variety of data with large differences in shape and density for clustering simulation experiments.(2)In view of the problem that poor clustering effects of traditional DPC algorithm in high-dimensional data and the high complexity of the algorithm,this thesis proposes a density peak clustering algorithm based on rough set subspace(Density Peak Clustering Based on Rough Set Subspace,DPC-RSS).This algorithm combines the advantages of the DPC algorithm and adopts an iterative process as a whole.Based on the subspace clustering mode,the rough set theory is applied to improve the clustering idea.The rationality of using the DPC algorithm to select the clustering center point and the efficient processing capability of the subspace clustering mode for high-dimensional data effectively avoids the application problem of the traditional DPC algorithm in high-dimensional data.The improved algorithm is used in multiple Clustering simulation experiments were carried out on high-dimensional data sets.(3)The algorithm proposed in this article is applied to the analysis of tumor cell subtypes.First,the tumor gene expression data set is preprocessed,and the algorithm is applied to the tumor gene expression data set,which is achieved by analyzing the differential expression between genes Judgment of different subtypes of tumor cells.And conduct multiple sets of simulation experiments to prove that the algorithm proposed in this thesis can more accurately determine the subtype of tumor cells.
Keywords/Search Tags:Tumor gene expression data, Density Peak Clustering, KNN Algorithm, Subspace Clustering, Tumor cell subtype clustering
PDF Full Text Request
Related items