Font Size: a A A

ISODATA Model And Its Application On Gap Statistics

Posted on:2019-11-09Degree:MasterType:Thesis
Country:ChinaCandidate:T DouFull Text:PDF
GTID:2370330551456382Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
The proposed method of Gap Statistics is based on the K-means clustering algo-rithm.According to K-means algorithm is easy to be affected by the initial clustering centers and the number of clusters.And we can get a rough classification of data sets,but can not get the fine classification of data sets by the method of Gap Statistics.To overcome this shortcoming,this paper introduce the ISODATA algorithm into the Gap Statistics.First,because of the ISODATA algorithm needs to determine the number of initial clustering numbers,the relationship between the multidimensional Chebyshev inequality and martensitic distance is verified.Based on their relationship,the method of determining the initial number of clusters is proposed.Then an improved ISODATA algorithm,namely MISODATA algorithm,is proposed,and then the MIGS model is proposed.The feasibility and effectiveness of the MIGS model is analyzed empirically.MIGS model can not only realize the fine classification of data,but also the accuracy of estimating the best number of data sets by MIGS model is higher than that of the original GS model.Secondly,although the improved MIGS model has many advantages,MISODATA algorithm still needs to set parameters to control splitting and merging manually.For different data sets,the selection of the two parameters is different,and the range of the value is zero to positive,so the value of the two parameters is difficult to be deter-mined.To overcome this shortcoming,the concept of merger degree and split degree are introduced to improve the MISODATA algorithm,then the estimation model of the degree of variation based on the coefficient of variation and a normalized estimation model based on combining degree are proposed.Then the FMISODATA algorithm are proposed.The convergence speed and convergence accuracy of FMISODATA al-gorithm axe analyzed,and the influence of splitting degree and merging degree on the stability of clustering results is analyzed.The empirical results show that the FMISO-DATA algorithm not only ensures the accuracy of the optimal number of data sets,but also simplifies the operation,making the MISODATA algorithm more convenient and effective.Finally,this paper discuss the study of FMISODATA algorithm on Gap Statistics,and a FMIGS model is proposed to estimate the best number of the data set.And the feasibility and superiority of FMIGS model is analyzed by empirical analysis.The empirical results show that the FMIGS model can reflect the characteristics of the data set more while ensuring the accuracy.At the end of the paper,the problems in the model are analyzed and discussed,and the future direction of the research is pointed out.
Keywords/Search Tags:Gap Statistics, Cluster numbers, K-means algorithm, ISODATA algorithm, Multidimensional Chebyshev's inequality, MIGS Model, Coefficient of Variation, FMIGS Model
PDF Full Text Request
Related items