Font Size: a A A

Research On Density Peaks Clustering Algorithm Based On DNA Microarray Data And Its Application

Posted on:2021-03-24Degree:MasterType:Thesis
Country:ChinaCandidate:X G LiuFull Text:PDF
GTID:2370330605461119Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the field of bioinformatics,clustering analysis of tumor samples through the DNA microarray data to classify different tumor types or subtypes is the focus of the study.Using DNA microarray data to analyze tumors at the molecular level,not only can classify different tumor subtypes according to the different expressions of related pathogenic genes in the same tumor,but also some unknown tumor subtypes can be predicted and classified.Due to the characteristics of genes and the high cost of DNA microarray technology,DNA microarray datasets are mostly characterized by high dimensions,small samples.Density Peak Clustering,proposed in 2014 in the Science,has been widely recognized in various fields due to its advantages of simple parameters and high clustering accuracy.It has high research value.In this dissertation,several effective clustering methods are proposed to improve the original DPC.And the improved algorithm is applied to the DNA microarray datasets to research the clustering of tumor subtypes The main research contents are as follows:(1)In order to solve the problem that the DPC artificially participates in the selection of key parameters,A method of combining the DPC with intelligent optimization algorithm is proposed.Combining the Bat Algorithm and the DPC to select parameters.Firstly,the adaptive weight is added to the speed update formula of the BA to improve.the slow convergence speed and the tendency to fall into the local optimum in the search process.Then,the cluster validity index is used as fitness function and the improved Bat algorithm is used to select the key parameters of the DPC.The selection method of initial centroids is also improved.The validity of the method is verified by experimental comparison.(2)In order to improve the performance of the DPC in the high-dimensional complex datasets and the simple allocation strategy of the remaining points.The DPC and the EWKM are combined to improve the algorithm.Utilizing the high dimensional complex data processing capacity and the rationality of the remaining points.allocation strategy of the EWKM,combining the advantages of both algorithms while avoiding the shortcomings of both algorithms.The DPC is used to select the initial clustering center point,and the EWKM is used to allocate the subsequent data points to improve the algorithm.The validity of the method is verified by experimental comparison.(3)The algorithm proposed is applied in the field of tumor subtype clustering.Firstly,the DNA microarray data set is pre-processed to remove the genes not related to tumor incidence.Then the algorithm proposed is applied to the DNA microarray datasest to explore the clustering classes of different subtypes of tumor by exploring the differential expression of genes.Through experimental comparison,it is proved that the algorithm can accurately cluster tumor subtypes by analyzing the differential expression of different genes,which is of great significance in practical applications.
Keywords/Search Tags:DNA microarray, Density Peak Clustering, Bat algorithm, Entropy Weighting K-Means Algorithm for Subspace Clustering, Tumor subtype clustering
PDF Full Text Request
Related items