Font Size: a A A

The Application Research Of K-means Algorithm For Geological Hazard System

Posted on:2019-02-03Degree:MasterType:Thesis
Country:ChinaCandidate:L Y XueFull Text:PDF
GTID:2370330545957840Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
K-means algorithm is a classical clustering algorithm.The algorithm is simple and can often get better clustering results.However,this algorithm also has some deficiencies: First of all,the number of clusters K must be set in advance;secondly,it is greatly affected by the initial cluster center pole;if the selected initial cluster centers are not dispersed enough,the original data set cannot be well reflected.The distribution;Finally,the algorithm time complexity is too high.In order to reduce the dependence of the K-means algorithm on the initial value and improve the effectiveness of the algorithm,this paper discusses the optimization of the initial clustering center of the K-means algorithm.The main research contents are as follows:1)Apply uniform sampling techniques on the original data set.Before selecting each cluster center,the K-means algorithm must scan the database once.This will result in a very large amount of calculations.Therefore,this paper chooses to sample the origina l data set first,so that it will not only pre-process the original data.Deal with the effects,and can play the advantages of K-means algorithm.2)In the initial stage of clustering,the traditional UPGMA algorithm can find dense areas well,but it is not easy to find the order of cluster formation,resulting in the selected initial cluster center point can not represent the distribution status of the actual data set,so the Clustering conditions and screening conditions to ensure that the initial cluster center candidate points will all come from high density areas,while avoiding edge data and noise data.However,the improved UPGMA algorithm also has its disadvantages.That is,if the setting of the clustering condition and the screening condition is not properly set,the initial cluster center point selected cannot be too dense.If the maximum and minimum distance algorithms are added in the Canopy algorithm,the disadvantages of the improved UPGMA algorithm can be well compensated to ensure that the obtained initial cluster centers are not too dense and thus can accurately reflect the distribution of the actual data sets.3)Design a CMU-kmeans algorithm(K-means Algorithm based on Canopy with Min-Max Algorithm and UPGMA Algorithm).The algorithm can not only determine the clustering number k adaptively,but also can obtain the optimized initial clustering center effectively.Therefore,it ensures the scientific selection of the initial value to a great extent.4)Using a modified algorithm to perform cluster analysis on the historical datasets of the geological disaster monitoring system.Through experiments,the effectiveness and applicability of the improved algorithm are effectively verified.
Keywords/Search Tags:Geological disasters, Canopy algorithm, Clusteranalysis, CMU-kmeans algorithm
PDF Full Text Request
Related items