Font Size: a A A

Incomplete Data Fuzzy Clustering Methods Based On Consistency

Posted on:2018-07-31Degree:MasterType:Thesis
Country:ChinaCandidate:X R XuFull Text:PDF
GTID:2348330536961554Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
In real world application,by the influence of environment and human factors,data sets often contain missing values.It influences the analysis of data sets.Currently,incomplete data fuzzy clustering is a hot topic.However,most of the algorithms can only be applied to data sets with low correlation.When it comes to data set with strong correlation,those algorithms can’t achieve good results.Based on that,this paper brings in consistency as similarity measurement,and puts forward different clustering methods.The main jobs include:1.Border data are more likely to be misclassified comparing with core data.Therefore,a new border data re-classificationmethod based on consistency is proposed.Firstly,the method utilizes fuzzy c-means to obtain primary clustering results.Then,nearest neighbor rule is used to select data which are probably misclassified.The membership degree of selected data would be modified by consistency.Since the proposed method takes consistency into account,the accuracy of clustering results is improved.Experimental results indicate that the new algorithm functions well on data sets with strong correlation and overlaps.2.Considering the uncertainty of missing values,a pseudo-nearest-neighbor interval description based on consistency is proposed and utilized in incomplete data clustering.The new algorithm searches for pseudo-nearest-neighbor through consistency,and uses the corresponding attribute values to transform missing values into interval type.The complete attribute values are transformed into interval with equal upper and down limits as well.Clustering is then conducted on the transformed data set.Using consistency to search for neighbors has several advantages.First of all,it can obtain distribution of attribute values directly from the original data set.Secondly,the interval description can fully appear the uncertainty of missing values.Besides that,it can guarantee the selected neighbors appear more similar attribute features.Experimental results indicate that the new algorithm is able to achieve good clustering results on both artificial data sets with strong correlation and heavy overlaps,and real data sets as well.3.Furthermore,sample weighted is brought in to modify the algorithm above.The modified algorithm decreases the influence of outliers.Therefore,the accuracy of clustering results can be improved.Experimental results indicate that the modified algorithm can not only decrease misclassification,but also cut short running time.
Keywords/Search Tags:Consistency, Incomplete Data Set, Fuzzy c-means, Sample Weighting
PDF Full Text Request
Related items