Clustering analysis is the process of grouping data, and its goal is extracting hidden information of the data according to their structure and get a reasonable division. Clustering analysis has become a very effective tool in the field of data mining and machine learning.Spectral clustering, as a new branch of clustering analysis, has been a huge concern and development in the past decade. Its success is ascribed not only easy to implement but also it can map data sets from the original space into low-dimensional feature space, which will make the data linearly separable. Then we can use the traditional clustering algorithm in the new feature space for clustering. In addition, the researchers loved the spectral clustering also because it has a strong theoretical foundation, and many areas are inextricably linked, which makes spectral clustering to get more and more applications. However, because the entire similarity matrix spectral clustering needs to stored in the implementation process and the characteristics of decomposition, it needs to consume a large amount of time and space, which makes low practicality of spectral clustering in large-scale data.In order to reduce the complexity of spectral clustering, a lot of effective methods have been proposed in recent years. Wherein the optimal method is Nystrom extension, which approximate the original characteristics of the data space by a small portion of the sampled points. This effectively solves the problem of time and space. The sampling algorithm is the most important aspect of the Nystrom extension.In this paper, we summarized the study of spectral clustering in recent years, and focus on the sampling schemes of the Nystrom extension applied to spectral clustering. There have been several sampling schemes, but they were focus on the theoretically analysis for matrix approximation error. However, the impact of the sampled set on the clustering performance has not been studied. We argued that that matrix approximation error has no direct impact on the clustering performances. Using the landmark points to predict the labels of the remaining points by introducing the concept called predictive ability. With loss analysis, it suggests an incremental sampling algorithm. Experimental results show that the proposed incremental sampling scheme outperforms existing sampling schemes on various clustering tasks, and its efficiency is comparable. |