| Cluster analysis is an important ingredient in data mining,machine learn-ing and pattern recognition.Its basic idea is to divide a set of data into several different clusters based on the similarity between data points,and make the similarity of data points in the same cluster as high as possible and the data points between different clusters as low as possible.Cluster analysis has a wide range of applications in many fields,such as:image processing,bioinformatics,social networking,and so forth.After decades of research,many classical clustering methods have been proposed such as k-means clus-tering,Gaussian mixed model clustering,spectral clustering,and subspace clustering.These clustering methods often require entering the number of clusters before executing,but determining the number of clusters for an unlabeled data set is a challenging task.Based on regularization technique and regression method,this dissertation systematically studies the optimization models,theory and algorithms of fusion regularized clustering methods that do not require the number of clusters to be given in advance.In this dissertation,we propose a sparse group lasso fusion regularized clustering method for high-dimensional data,which can simultaneously divide observations and eliminate noninformative features.Based on the convexity and separability of regulariza-tion term,we theoretically provide a bound of the prediction error and further establish feature selection consistency.Moreover,a semi-proximal alternating direction method of multipliers is designed to solve the proposed fusion regularized clustering model,and its convergence is established.Finally,experimental results both on synthetic and gene ex-pression data sets illustrate that the proposed method has excellent performance in terms of clustering and feature selection.In order to overcome the disadvantage that convex regularization terms may lead to biased estimates,we propose a nonconvex discontinuous l0fusion regularized clustering model based on the l0penalty function.Theoretically,we first analyze the existence of the optimal solutions of our model and deduce an upper bound of the regularized parameter.Then we establish the relationships among the Karush-Kuhn-Tucker(KKT)point,α-stationary point and local optimal solutions.Moreover,based on theα-stationary point,we prove that the distances among different cluster centers are greater than a positive threshold.Computationally,we solve the model via the alternating direction method of multipliers,whose limit point is aα-stationary point and local optimal solution.Finally,we conduct extensive experiments on both synthetic and real data sets.Experimental results show outstanding performance of this method in comparison with several state-of-the-art clustering methods.For the above l0fusion regularized clustering model,we design a second-order op-timization algorithm with fast convergence speed and low computational complexity.To the best of our knowledge,this is the first second-order algorithm for the nonconvex fusion regularized clustering framework.Note that the l0fusion regularized clustering model can boil down to a composite row sparsity regularized(c RSR)minimization prob-lem.In order to have broad applicability,we systematically study the optimality condi-tions,optimization algorithms and applications of the c RSR minimization problem.In theory,we establish the relationships among the critical stationary point,α-stationary point,strongα-stationary point and the local/global optimal solution.Then,we derive a crucial stationary equation from the(strong)α-stationary point.In terms of numerical calculation,we design an easy-to-implement Newton algorithm based on the stationary equation,and establish the quadratic convergence rate and iteration complexity under some mild assumptions.Finally,the proposed Newton algorithm is used to solve the l0fusion regularized clustering model and trend filtering problem.Extensive experimental results demonstrate that our Newton algorithm has good numerical performance. |