| Cluster analysis is an unsupervised method and an important technology in data mining.The k-means algorithm has a history of more than 50 years since it was proposed.As one of the most commonly used algorithms in the field of clustering analysis,it has been widely used by scholars at home and abroad due to its simplicity,efficiency,fast convergence speed,and ease of implementation.However,in the era of big data,various industries in society have generated a large amount of data and information.Facing the characteristics of massive data with high dimensions and large data sets,using k-means algorithm,on the one hand,it is easy to be affected by the selection of initial clustering centers and affect the stability and accuracy of clustering results,and it is easy to fall into local optimization.On the other hand,k-means algorithm is a hard clustering algorithm,which has clear boundaries between classes in practical applications.However,when processing uncertain data,forcing an object to be divided into clusters often brings high decision-making risk and reduces clustering accuracy.Therefore,traditional k-means cannot effectively handle such uncertain data.The three-way k-means clustering algorithm introduces the idea of three-way decision-making,using the core domain and boundary domain to represent each class,which is a significant improvement over the traditional k-means clustering algorithm.This algorithm introduces allowable errors in the k-means iteration process,and represents the results of each class using the core domain and boundary domain,which better solves the problem of processing uncertain data,but still has the problem of being sensitive to the initial clustering center.For this reason,this article has conducted research and improvement on the shortcomings of three-way k-means clustering algorithms,mainly including the following two aspects:(I)A three-way k-means clustering algorithm based on artificial bee colony is proposed.An improvement is made to solve the problem that the three-way k-means clustering algorithm is sensitive to the initial clustering center and is prone to fall into local optimal solutions.By improving the calculation of the initial clustering center using the three-way k-means clustering algorithm,a method of dynamically adjusting the weight through the number of data objects in the core and boundary domains is designed,and the fitness function of the honey source is constructed by defining the intra class aggregation function and the inter class dispersion function,so that the algorithm can quickly approach the optimal solution in the search space.The algorithm first randomly selects k cluster center combinations as the initial honey source,and calculates the fitness value of the initial honey source.Then,using the mutual cooperation and exchange between bee colonies,the dataset is iterated multiple times to find the optimal honey source location as the clustering center,and on this basis,three-way k-means clusters are conducted,alternately.The improved algorithm improves the stability of clustering results.Comparative experiments with clustering indicators and other algorithms on UCI datasets demonstrate that this method can improve the stability of clustering results.(II)A three-way k-means clustering algorithm based on ant colony algorithm is proposed.In order to overcome the problem that the three-way k-means clustering algorithm is sensitive to initial points and the empirical weight setting ignores data differences,an improved algorithm is proposed.The improved algorithm utilizes the random probability selection strategy of ant colony algorithm and the positive and negative feedback mechanism of pheromones,as well as in order to avoid the impact of artificially set weights on clustering results.In view of previous research results,a method of automatically adjusting the weights of core and boundary domains based on the ratio of the number of objects in the core and boundary domains is adopted.The objective function in the algorithm is improved by taking into account both intra class distance and inter class distance,in order to optimize the three-way k-means clustering algorithm.In the algorithm,ants first randomly select a sample in the sample space as a starting point,reference the size of the pheromone amount between the sample and the cluster center,and the heuristic function,calculate the probability of the sample reaching each cluster center through a random probability selection strategy,and allocate the sample to a certain cluster center using the roulette wheel method.Then the ant selects another sample until all the samples are classified,that is,completing an iteration to form a solution.Calculate the optimal solution through the objective function value and save it.Finally,experimental comparative analysis on UCI dataset shows that this algorithm can make DBI smaller,average contour coefficient larger,and clustering results more accurate. |