Incomplete Data Fuzzy Clustering Methods Based On Consistency

Posted on:2018-07-31

Degree:Master

Type:Thesis

Country:China

Candidate:X R Xu

Full Text:PDF

GTID:2348330536961554

Subject:Control theory and control engineering

Abstract/Summary:

PDF Full Text Request

In real world application,by the influence of environment and human factors,data sets often contain missing values.It influences the analysis of data sets.Currently,incomplete data fuzzy clustering is a hot topic.However,most of the algorithms can only be applied to data sets with low correlation.When it comes to data set with strong correlation,those algorithms can’t achieve good results.Based on that,this paper brings in consistency as similarity measurement,and puts forward different clustering methods.The main jobs include:1.Border data are more likely to be misclassified comparing with core data.Therefore,a new border data re-classificationmethod based on consistency is proposed.Firstly,the method utilizes fuzzy c-means to obtain primary clustering results.Then,nearest neighbor rule is used to select data which are probably misclassified.The membership degree of selected data would be modified by consistency.Since the proposed method takes consistency into account,the accuracy of clustering results is improved.Experimental results indicate that the new algorithm functions well on data sets with strong correlation and overlaps.2.Considering the uncertainty of missing values,a pseudo-nearest-neighbor interval description based on consistency is proposed and utilized in incomplete data clustering.The new algorithm searches for pseudo-nearest-neighbor through consistency,and uses the corresponding attribute values to transform missing values into interval type.The complete attribute values are transformed into interval with equal upper and down limits as well.Clustering is then conducted on the transformed data set.Using consistency to search for neighbors has several advantages.First of all,it can obtain distribution of attribute values directly from the original data set.Secondly,the interval description can fully appear the uncertainty of missing values.Besides that,it can guarantee the selected neighbors appear more similar attribute features.Experimental results indicate that the new algorithm is able to achieve good clustering results on both artificial data sets with strong correlation and heavy overlaps,and real data sets as well.3.Furthermore,sample weighted is brought in to modify the algorithm above.The modified algorithm decreases the influence of outliers.Therefore,the accuracy of clustering results can be improved.Experimental results indicate that the modified algorithm can not only decrease misclassification,but also cut short running time.

Keywords/Search Tags:

Consistency, Incomplete Data Set, Fuzzy c-means, Sample Weighting

PDF Full Text Request

Related items

1	Research Of Weighted Clustering Algorithm For Incomplete Data Based On Adaptive Interval
2	Research Of Hybrid Clustering Algorithm For Incomplete Data Based On Local Weighting
3	Study On Clustering For Incomplete Data Based On Sample Weighting And Cluster Dispersion
4	Study On Incomplete Data Clustering Method Based On Correlation Of Sample Neighbors
5	Research Of Fuzzy Clustering Algorithm For Incomplete Data Based On Information Feedback Rbf Network Valuation
6	Research Of Fuzzy Clustering Algorithm For Incomplete Data Based On Improved VAEGAN
7	Research Of Fuzzy Clustering Algorithm For Incomplete Data Based On Improved BP Imputation
8	Research On Fuzzy Clustering Algorithm Of Sample And Feature Weighting
9	Research Of Fuzzy Clustering Algorithm For Optimizing Incomplete Data Based On Extreme Learning Machine
10	Research On Sample Data Based Fuzzy Rules Extraction Method And Its Application