Font Size: a A A

Research On Clustering Method And Semi-supervised Method Based On Hybrid K-nearest-neighbor Graph

Posted on:2019-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y K QinFull Text:PDF
GTID:2370330566986169Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
With the advent of the Internet of Things,the data available to people is exploding.However,these data are often unlabeled,so it will take a lot of manpower and material resources to label massive amounts of data.And this is the reason why semi-supervised and unsupervised methods have received extensive attention from researchers,they want to be able to use a few labeled samples or even not using labeled samples to complete machine learning tasks.Most of the existing clustering and semi-supervised methods have difficulty in processing complex nonlinear data sets.To remedy this deficiency,in this paper,a novel data model termed Hybrid K-Nearest-Neighbor(HKNN)graph,which combines the advantages of mutual k-nearest-neighbor graph and k-nearest-neighbor graph,is proposed to represent the nonlinear data sets.Moreover,a Clustering method based on the HKNN graph(CHKNN)and a semi-supervised method based on the HKNN graph(SSLHKNN)are proposed.The second chapter introduces two graph models which have been extensively studied:the k-nearest neighbor graph and the mutual k-nearest neighbor graph,and analyzes the methods based on these two graph models.Finally,a hybrid k-nearest neighbor graph is proposed.The third chapter introduces the CHKNN method.The CHKNN first generates several tight and small subclusters,then merges these subclusters by exploiting the connectivity among them.In order to select the optimal parameters for CHKNN,we further propose an internal validity index termed K-Nearest-Neighbor Index(KNNI),which can also be used to evaluate the validity of nonlinear clustering results.Experimental results on synthetic and real-world data sets,as well as that on the video clustering,have demonstrated the significant improvement on performance over existing nonlinear clustering methods and internal validity indices.The fourth chapter introduces the SSLHKNN method.The method makes full use of the information of a small number of labeled data points,labels and merges the initially generatedsubclusters,and then spreads the labels to other unlabeled data points according to the connectivity and neighbor relationships.Experimental results on synthetic and real-world data sets have demonstrated the significant improvement on performance over existing nonlinear semi-supervised methods...
Keywords/Search Tags:Hybrid k-nearest-neighbor graph, Non-linear data set, Clustering method, Internal validity index, Semi-supervised method
PDF Full Text Request
Related items