Font Size: a A A

Research On Clustering Algorithms Based On Different Types Of Data

Posted on:2020-10-18Degree:MasterType:Thesis
Country:ChinaCandidate:M M ZhangFull Text:PDF
GTID:2370330578969131Subject:Statistics
Abstract/Summary:PDF Full Text Request
Cluster analysis is an unsupervised learning method,which is one of the important ways of data mining.It is also one of the important research directions of statistical machine learning and pattern recognition.At present,clustering research on numerical data has achieved fruitful results,but in practice,most of the data are categorical data or mixed data.Therefore,it is particularly important to study clustering algorithms for different types of data.There are two difficult problems in clustering process: the selection of class centers and the determination of the number of clusters.In order to solve these problems,the paper takes categorical and mixed data as examples,and makes the following research results:(1)Aiming at the categorical data,the paper mainly studies the matrix-object data,and proposes an improved MD fuzzy 6)-modes algorithm based on categorical matrix-object data.This algorithm expands the simple “0-1” matching,redefines the dissimilarity measure of the matrix-object data,and overcomes the disadvantage of losing information when clustering such data with traditional algorithms.In the selection of class centers,a heuristic updating algorithm is proposed based on the concept of fuzzy sets,which greatly reduces the time complexity.Finally,the effectiveness of the MD fuzzy 6)-modes algorithm is verified on five UCI data sets.(2)Aiming at the mixed data,the measure method of attribute weight is given based on information entropy,and a weighted 6)-prototype algorithm is proposed to determine the number of clusters.Considering attribute weights,the algorithm redefines the sum of between-class entropies in absence of a cluster,validity index and dissimilarity measure for mixed data.The experimental results show that the new weighted 6)-prototype clustering algorithm is better than Liang 6)-prototype on the six evaluation indexes,such as clustering accuracy.The research results in the paper not only enrich the clustering algorithm research under different types of data,but also provide a new method support for clustering categorical matrix-object data and mixed data to some extent,and provide new technical support for related fields of data mining.
Keywords/Search Tags:Clustering algorithm, Matrix-object data, Dissimilarity measure, Class center, Cluster number, Mixed data, Attribute weight
PDF Full Text Request
Related items