Font Size: a A A

Research On Semi-Supervised Clustering Algorithms Based On Rough Set

Posted on:2019-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:R H LiuFull Text:PDF
GTID:2428330578972020Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years,data mining technology has developed rapidly,and semi-supervised learning has become a key issue in the field of pattern recognition and machine learning.The supervised learning algorithm rely on a large number of labeled data to learn,but it is difficult to obtain the label data in many scenarios,so that it can't guarantee the insufficient generalization ability of these algorithms.And semi-supervised learning requires only a small amount of label data to complete the classification of a large number of unlabeled data,which makes semi-supervised learning more widely applicable.Rough set theory is a data analysis method to deal with various incomplete,inaccurate and uncertain data.Rough set can be used to classify the samples through the undistinguishable relationship between them,and do knowledge discovery on the approximation of the target.In this article,rough set theory is applied to semi-supervised clustering,a small number of labeled samples and a large number of unlabeled samples are used to semi-supervised clustering based on the undistinguishable relationships at different attributes,which can help to find more information on the dimensions.In this article,two kinds of semi-supervised clustering are proposed based on rough set theory,which are applied to non-sparse data and high dimensional sparse data respectively,in order to eliminate the algorithms limitations defined on traditional distance in high-dimensional data.The main research work of this article is as follows:1)For non-sparse data,a semi-supervised clustering algorithm based on the undiscernible relation(ER-SSC)is proposed.ER-SSC is clustered though weak undistinguishable relation which defined by the neighborhood rough set and the quantitative rough set,and the neighborhood radius is dynamically searched by multiple iterations,and the fuzzy sample points are eliminated by some strategy.The final result is weighted by multiple random selection attributes for high dimension non-sparse data.2)For high dimensional sparse data,a semi-supervised clustering algorithm based on attribute selection(FS-SSC)is proposed.In the FS-SSC algorithm,we need to give a small number of key attributes to each class,and need to cluster by the representative points constructed by the key attributes.FS-SSC uses a distance based on key attributes to measure the similarity between the sample points and the representative points,and extends key attributes set for every class through a clustering result,and improves the clustering effect gradually in this way.
Keywords/Search Tags:semi-supervised learning, clustering, rough set, high dimension, sparse data
PDF Full Text Request
Related items