| With the implementation of The Cancer Genome Atlas(TCGA)project and the development of the next-generation sequencing technology,massive amounts of complex bioomics data have been generated.These omics data contain a wealth of genetic codes related to biological functions and gene regulation.How to explore and extract the key data from massive omics data to obtain important biological information is one of the current research hotspots.The traditional related methods of matrix factorization have achieved some satisfactory results in different fields,but the data expression methods based on matrix have an obvious shortcoming,that is,the matrix model cannot fully explore the multi-dimensional spatial structure of omics data,so it cannot effectively mine the multi-view fusion information,which limits the improvement of method performance to a certain extent.The third-order tensor has attracted the attention of related scholars due to its special structural characteristics.The data processing methods based on the third-order tensor can ensure that the three-dimensional structure of the data is not destroyed to a certain extent,so it can explore the hidden information in the multi-dimensional data.To explore multi-view omics data more efficiently,this paper aims to improve the existing tensor robust principal component analysis(TRPCA)method and successfully apply it to cancer omics data.The specific research content are as follows:(1)Aiming at the high-dimensionality,redundancy and manifold structure of cancer omics data,the hyper-graph regularized tensor robust principal component analysis method(HTRPCA)based on L1norm is proposed.This method explores the association between multiple sample points by imposing hyper-graph regularization constraint in the objective function of TRPCA,and fully excavates the complementary information between different data types.Secondly,the sparsity constraint in the objective function can filter out redundant information in the original data and improve the performance of the algorithm to a certain extent.Finally,the HTRPCA is used to perform sample clustering experiments on cancer omics data to provide a method reference for discovering new cancer subtypes.(2)To solve the problem that the low-rank tensor obtained by the decomposition of the TRPCA model may be damaged,the tensor robust principal component analysis with low-rank weight constraints(WTRPCA)model based on L2,1norm is proposed.By imposing an additional constraint penalty term on the TRPCA model,the damaged low-rank data are repaired.Specifically,the weight tensor is set according to the error value generated in the tensor decomposition process,and each weight value is assigned to the corresponding low-rank tensor element to compensate for the low-rank data.In addition,the L2,1norm is used to constrain the sparse tensor to make the sparse effect better.In the experimental part,the low-rank tensor obtained after decomposition by the WTRPCA method is used to cluster the cancer samples to explore the similarities and associations between multiple diseases.WTRPCA may provide new ideas for future cancer research and treatment.(3)Aiming at the problem that TRPCA cannot completely model different noises to further restore low-rank tensors,a new method based on L2,1norm called tensor robust principal component analysis with double constraints(DCTRPCA)is proposed.First,the logical function is introduced to constrain the sparse components obtained by tensor decomposition,and then the L2,1norm is used to strengthen the constraint of sparse tensors with weights.Applying these two constraints at the same time can increase the sparsity while restoring low-rank data.The framework aims to find the maximum likelihood estimation solution of the sparse error tensor and restore low-rank components as much as possible.Finally,the low-rank components is used for cancer sample clustering,and the sparse components is used for feature selection to screen more differential expressions genes.This is of great significance for directly studying epigenetic phenomena at the gene level and exploring the pathogenesis of diseases.The methods proposed in this paper effectively considers the manifold structure in the data and obtains more comprehensive omics data information while effectively handles noises and outliers.The results of sample clustering and feature selection experiments show that the methods proposed in this paper are superior to other similar methods. |