Font Size: a A A

Robust Tensor Clustering For High-dimensional Data

Posted on:2024-06-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y HuFull Text:PDF
GTID:1528307184480464Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Clustering is a vital research area in unsupervised machine learning and data mining that seeks to identify the underlying structure of data without external expert labels.Clustering has numerous applications in data-to-knowledge,multi-media data aggregation,and precision medicine with the human genome.With the emergence of modern scenarios like multi-media,biological medicine,and intelligent manufacturing,there is a large amount of high-dimensional,noisy,and multi-view unlabeled complex data that needs to be clustered.However,clustering such complex data poses challenges of inaccurate and unstable clustering performance.Re-cent advances in tensor-based clustering for high-dimensional,noisy,and multi-view data have drawn increasing research attention.Tensor-based clustering can describe high-order multi-wise similarities among data,characterizing a comprehensive spatial structure and providing noise-robust similarity estimation,potentially addressing the challenges stemming from complex data.In this paper,we develop tensor-based clustering that addresses the clustering challenges brought by high-dimensional,noisy,and multi-view data.We focus on three research aspects: discrim-inating tensor multi-wise similarities,low-memory-cost tensor learning,and integrating multi-view tensor-embedded data.Our main contributions are summarized as follows.(1)A discriminating tensor spectral clustering based on the high-dimensional statistical theory is developed.We observe that current tensor multi-wise similarities suffer from the con-centration effect,making it hard to obtain reliable high-dimensional clustering performance.To address this issue,we propose anchor-distance-based discriminating tensor spectral clustering to achieve cluster-discriminative high-dimensional clustering.This approach is inspired and fur-ther proved by the high-dimensional statistical theory.Our experiments on synthetic data and public benchmark datasets,including text,image,and bioinformatics,demonstrate that the pro-posed discriminating tensor spectral clustering achieves consistent clustering performance im-provement compared to traditional tensor spectral clustering.Furthermore,our proposed method mitigates the similarity biases caused by the concentration effect and effectively alleviates noise contamination.(2)A low-memory-cost deep tensor spectral clustering network based on deep learning is developed.We observe that stochastic optimization of deep learning enables the reduction of computational space complexity.Based on this observation,we propose a tensor spectral clus-tering network that achieves memory-efficient tensor spectral learning,reducing the memory cost.Furthermore,our proposed tensor spectral clustering network allows us to integrate mul-tiple affinity tensors jointly,enabling efficient tensor integration while keeping a low memory cost.This approach ensures reliable high-dimensional clustering performance with relatively low computational requirements.Our experiments on extensive public benchmark datasets demonstrate that the proposed method consistently outperforms baseline methods while cost-ing less than 1/1000 of what traditional tensor spectral clustering would cost.(3)A robust multi-view clustering based on manifold geometry is developed.We observe that the signal inconsistency of multi-view high-dimensional features and unevenly distributed noise can be mitigated by exploiting the intrinsic manifold in multi-view tensor embedding.Based on this observation,we propose learning a consensus multi-view tensor embedding based on the Stiefel manifold to achieve robust multi-view clustering.Our experiments on public noise simulation and benchmark datasets demonstrate that the proposed multi-view integration on the Stiefel manifold can obtain a relatively stable multi-view clustering performance under scale-varying noise and achieve consistent performance improvement over baseline methods.In conclusion,our proposed tensor-based clustering approaches address the challenges of high-dimensional,noisy,and multi-view data by discriminating tensor multi-wise similarities,low-memory-cost tensor learning,and integration of multi-view tensor-embedded data.Our ex-periments on public benchmark datasets demonstrate that our proposed methods achieve consis-tent clustering performance improvement compared to baseline methods,mitigating similarity biases,reducing memory costs,and achieving robust multi-view clustering.
Keywords/Search Tags:Unsupervised Machine Learning, Data Mining, High-dimensional Clustering, Multi-view Clustering, Tensor-based Learning
PDF Full Text Request
Related items