Research On Clustering Method And Semi-supervised Method Based On Hybrid K-nearest-neighbor Graph

Posted on:2019-03-13

Degree:Master

Type:Thesis

Country:China

Candidate:Y K Qin

Full Text:PDF

GTID:2370330566986169

Subject:Pattern Recognition and Intelligent Systems

Abstract/Summary:

With the advent of the Internet of Things,the data available to people is exploding.However,these data are often unlabeled,so it will take a lot of manpower and material resources to label massive amounts of data.And this is the reason why semi-supervised and unsupervised methods have received extensive attention from researchers,they want to be able to use a few labeled samples or even not using labeled samples to complete machine learning tasks.Most of the existing clustering and semi-supervised methods have difficulty in processing complex nonlinear data sets.To remedy this deficiency,in this paper,a novel data model termed Hybrid K-Nearest-Neighbor(HKNN)graph,which combines the advantages of mutual k-nearest-neighbor graph and k-nearest-neighbor graph,is proposed to represent the nonlinear data sets.Moreover,a Clustering method based on the HKNN graph(CHKNN)and a semi-supervised method based on the HKNN graph(SSLHKNN)are proposed.The second chapter introduces two graph models which have been extensively studied:the k-nearest neighbor graph and the mutual k-nearest neighbor graph,and analyzes the methods based on these two graph models.Finally,a hybrid k-nearest neighbor graph is proposed.The third chapter introduces the CHKNN method.The CHKNN first generates several tight and small subclusters,then merges these subclusters by exploiting the connectivity among them.In order to select the optimal parameters for CHKNN,we further propose an internal validity index termed K-Nearest-Neighbor Index(KNNI),which can also be used to evaluate the validity of nonlinear clustering results.Experimental results on synthetic and real-world data sets,as well as that on the video clustering,have demonstrated the significant improvement on performance over existing nonlinear clustering methods and internal validity indices.The fourth chapter introduces the SSLHKNN method.The method makes full use of the information of a small number of labeled data points,labels and merges the initially generatedsubclusters,and then spreads the labels to other unlabeled data points according to the connectivity and neighbor relationships.Experimental results on synthetic and real-world data sets have demonstrated the significant improvement on performance over existing nonlinear semi-supervised methods...

Keywords/Search Tags:

Hybrid k-nearest-neighbor graph, Non-linear data set, Clustering method, Internal validity index, Semi-supervised method

Related items

1	Research On Approximate Nearest Neighbor Search Algorithm Based On Graph
2	Research On Semi-supervised Graph Neural Network Method Based On Graph Fusion
3	Research On Semi-supervised Classification Algorithm Based On Graph
4	Efficient Clustering Algorithm For Large-Scale Single-Cell Transcriptome Data
5	Application Of Nearest Neighbor Clustering And MCP In K- Arm DNA Computing
6	Recognition Of Essential Proteins Based On Improved Edge Clustering Coefficient And K-nearest Neighbor Algorithm
7	The Research On Graph-based Clustering Analysis
8	Research On Semi-supervised Graph Node Classification Method Fused With Local Neighborhood Information
9	Clustering Research On Single Cell RNA Sequencing Data
10	Graph Based Semi-supervised Sentiment Classification