Clustering analysis, a powerful data analysis tool, has been widely used in data mining and machine learning related tasks. Currently, there exists a large number of clustering algorithm. Among which, spectral clustering attracts more and more attention in recently due to its mathematical foundation and many successful applications.Compared with other clustering algorithms, spectral clustering is easy to understand. It transforms a complicated clustering problem which may be unable to solve into a different form that could be easily solved. Employing existing libraries for eigen-decompostion, its implementation is quite easy. Additionally, a wealthy of work related to the analysis of spectral clustering has been proposed in the past years. However, both the time (eigen decomposition) and space (storage of similarity matrix) complexities inhibit the applicability of spectral clustering to large scale problems.To alleviate the memory and computational burdens of spectral clustering for large scale problems, some kind of low-rank matrix approximation is usually employed. Nystrom method is an efficient technique to generate low rank matrix approximation and its most important aspect is sampling. The matrix approximation errors of several sampling schemes have been theoretically analyzed for a number of learning tasks.In this work, we firstly propose a brief review of the current work related to spectral clustering, especially the recently proposed sampling methods. However, the impact of matrix approximation error on the clustering performance of spectral clustering has not been studied. In this paper, we firstly analyze the performance of Nystrom method in terms of clusterability, thus answer the impact of matrix approximation error on the clustering performance of spectral clustering. Our analysis immediately suggests an incremental sampling scheme for the Nystrom method based spectral clustering. Experimental results show that the proposed incremental sampling scheme outperforms existing sampling schemes on various clustering tasks and image segmentation applications, and its efficiency is comparable with existing sampling schemes. |