| Multi-view data refers to data that comes from different channels or is found in multiple modes.Multi-view data contains both consistent and diverse information of things,which can help us understand things comprehensively.Therefore,it is necessary to improve the performance of clustering tasks by using the characteristics of multi-view data.The purpose of multi-view clustering is to use the characteristics of multiple views to divide data samples into different groups so that the samples in same group have higher similarity than those in other groups.In recent years,multi-view clustering has attracted extensive attention.Although previous multi-view clustering models have achieved great success in dealing with this problem,they still have some defects.For example,most of the existing methods only utilize the feature content or graph structure of the samples,and cannot explore the hidden clustering information in all samples.Many methods use graph structures that are built directly from original feature and are fixed throughout the optimization process,which may result in suboptimal performance due to noise in data samples.In order to solve the above problems,we propose a Consensus Guided Graph Autoencoder(CGGA)algorithm.In addition,considering that data samples are not complete in real world,we propose an Incomplete Multi-view Subspace Clustering with Optimal Graph Structure Learning(OGSL)algorithm to deal with the incomplete multi-view clustering problem.The main work of the thesis is summarized as follows:(1)In the thesis,we propose a Consensus Guided Graph Autoencoder(CGGA)to efficiently perform complete multi-view clustering.First,we learn for each view a new feature matrix by using graph autoencoders,where both structure information and node features can be effectively incorporated during the learning process.Second,we learn a set of omic-specific similarity matrices together with a consensus matrix based on the features obtained in the first step.The learned omic-specific similarity matrices are then fed back to the graph autoencoders to guide the feature learning.By iterating the two steps above,our method obtains a final consensus similarity matrix.In order to comprehensively evaluate the clustering performance of the model,we compare CGGA method with several approaches ranging from general-purpose multi-view clustering algorithms to multi-omics-specific integrative methods.Experimental results on machine learning data sets and cancer data sets demonstrate the superiority of this model.In addition,we verify the validity of the model using multiple omics data sets to identify cancer subtypes.Finally,we also investigated the clinical significance of glioblastoma clusters obtained and provided new insights into the treatment of patients with different subtypes.(2)We propose an Incomplete Multi-view Subspace Clustering with Optimal Graph Structure Learning(OGSL)model,which combines latent representation learning,spectral embedding and graph clustering together.And it is a jointly optimized incomplete multi-view clustering framework.Specifically,OGSL model first learns the latent representation for each view via LowRank Representation technology,and a low-dimensional spectral embedding matrix is obtained.We then force the reconstructed graph from spectral embedding of each view to approximate a global similarity graph.In addition,rank constraint is introduced to the Laplacian matrix of the global graph to ensure the optimal clustering structure of the learned global similarity matrix.In order to optimize the objective function model effectively,we design an Augmented Lagrange Multiplier(ALM)algorithm with alternating direction minimization.Compared with seven classical incomplete multi-view clustering algorithms,the performance of OGSL model is verified on seven benchmark data sets.To explore the task of cancer subtype identification,we also applied OGSL to the incomplete multi-view cancer datasets and the effectiveness of the method in the scenario of missing cancer samples was validated. |