| Different types of data are clustered in different ways.The data can be divided into single-view data and multi-view data,depending on whether the data is divided by view or not.Multi-view data is widely used because of its good complementarity.This makes the problem of clustering multi-view data increasingly important.The main work of this paper is as follows:First,for the multi-view data clustering problem,this paper improves the regularized K-means algorithm based on the K-means algorithm to handle multi-view data.Specifically,by controlling the regularization term,the overfitting phenomenon can be reduced;by calculating the indication matrix,the clustering result and the clustering centre can be clearly obtained.This paper also presents the methods and algorithms to deal with the clustering problem of single-view data.Second,the current trend is to store big data in the cloud,which makes multi-view data available in online form.However,the problems of online multi-view data clustering and online updating are still unsolved.In order to solve these problems,this paper proposes an online regularized K-means clustering method.Specifically,a non-negative matrix decomposition is used as the starting point of the model to find the indicator matrix and cluster centers for each cluster in the online multi-view dataset;for online updating,an online updating step is proposed to improve accuracy and speed.This paper also gives a method to handle online single-view data.Third,the number of features in some multi-view data is much higher than the sample size,which leads to high time spent in clustering or online clustering,and the sparsity between and within views cannot be controlled.To address this problem,this paper proposes a feature selection method for multi-view data,specifically,using parametrics for feature selection and adding a regularization term to avoid overfitting,the multi-view data after feature selection still retains the maximum number of features while reducing the running time of the algorithm.Fourth,in order to verify the effectiveness of the above algorithm,simulated data analysis and real data analysis are carried out to test its clustering performance,and stability analysis and sensitivity analysis are also carried out to give a line graph of the effect of different parameter changes on the clustering effect.The results show that the clustering method proposed in this paper outperforms the traditional and state-of-the-art methods in numerical analysis.Finally,this paper also develops an R package,ORKM,which contains all the clustering methods proposed in this paper and can be used to process online multi-/single-view datasets and multi/single-view datasets.Numerical analysis results show that ORKM is more effective than other R packages in clustering,and can significantly reduce the reproduction cost of all the algorithms in this paper,making it easier for readers to obtain clustering results. |