| Support Vector Machines (SVM) is a new methodology for pattern recognition, which is based on statistic learning theory. Recently its theoretical research and algorithm have a greatest development. Support Vector Clustering (SVC) is a novel clustering method, which maps the samples from input space to high dimension feature space via kernel functions. Although its performance has been improved compared with other algorithms, the time complexity of conventional SVM increases exponentially while the dataset increasing. So it is the focus of research to decrease the time complexity so as to use it as data mining tool. This paper researches the methodology of SVC, and puts forward a novel SVC: Minimum Spanning Trees Smoothing Support Vector Machine. The main works of this paper are:First, through analysis of the algorithm for searching support vector and the feature of clustering, an improved algorithm is brought forward which contains the smoothing method for searching support vector. This algorithm changes the restricted quadratic optimization to non-restricted optimization so that it can be solved by conventional methods. It could improve the performance greatly, save storage space and decrease the time complexity while the precision of support vector is kept efficiently. The experiment proves that the algorithm would reduce the optimization time further.Second, this paper researches the label method of SVC and minimum spanning trees (MST) clustering, puts forward a label method based on MST. It ameliorates the distance expressing via analyzing the distribution feature of high dimension data. The resemblances among points in feature space are well demonstrated so that the discrimination of samples is magnified, and reliability of clustering can also be augmented. Moreover, this algorithm decreases the time complexity greatly. The experiments results demonstrate that it is simpler and needs less time than other algorithms.Third, combining these two algorithms, we put forward a novel clustering algorithm called MST-SSVC. It reduces time complexity greatly and keeps precision via analyzing its parameters, so that it is possible to cope with huge data set using SVC.Finally, MST-SSVC is used in the clustering of social pension insurance survey data, and we can obtain some reasonable results. |