Research On Concept Factorization Algorithms For Data Representation And Clustering

Posted on:2023-07-03

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Y Zhang

Full Text:PDF

GTID:1528306629467234

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Concept Factorization,as an effective and well interpreted representation learning method,has attracted extensive attention in the fields of machine learning and data mining.However,classical Concept Factorization algorithms generally have the following problems:a)directly perform factorization based on raw data,making the models sensitive to noise;b)lack of dynamic adaptive local preservation mechanism,resulting in difficult selection of neighbor number;c)unsupervised learning mode cannot utilize label information,resulting in limited feature representation power;d)traditional models all have single-layer factorization structures,resulting in the inability to extract hidden deep-seated features.In view of the above shortcomings,this paper focuses on four technical issues:"Robust feature learning,adaptive weight construction,semi-supervised factorization mechanism and deep feature mining",to improve the "Robustness,locality preservation,feature discrimination,deep feature extraction capacity" of CF models to deal with representation and clustering tasks,according to the technical route of "from unsupervised to semi-supervised,from shallow model to deep architecture",and systematically proposes four novel concept factorization algorithms.This paper mainly includes the following research contents:(1)In order to solve the problem that the existing classical Concept Factorization is sensitive to noise and cannot preserve locality adaptively,we propose an unsupervised Robust Flexible Auto-weighted Local-coordinate Concept Factorization(RFA-LCF)framework.The model integrates robust Flexible CF,subspace recovery,robust sparse local coordinate coding and adaptive weighting learning into a unified model.For robust learning,we learn a sparse projection to recover the underlying clean data space,and then the flexible CF is performed in the projective feature space.RFA-LCF also uses a L2,1-norm based flexible residue to encode the mismatch between the recovered data and its reconstruction,and uses the robust sparse local-coordinate coding to represent data using a few nearby basis concepts.Besides,RFA-LCF preserves the manifold structures in basis concept space and new coordinate space jointly in an adaptive manner by minimizing the reconstruction errors on clean data,anchor points and coordinates jointly.By enhancing the robustness of CF to noise data,using flexible constraints to measure reconstruction errors,and jointly optimizing locality,the data representation power and clustering ability can be significantly improved.(2)In order to solve the problem that the existing models are not robust enough to noise and cannot utilize label information to improve performance,we propose a joint label prediction based Robust Semi-Supervised Adaptive Concept Factorization(RS2ACF)framework.RS2ACF integrates robust semi-supervised CF,joint label prediction,subspace recovery and adaptive locality-preservation into a unified model.To obtain robust features,this model minimizes the L2,1-norm based sparse error term.To make full use of partial label information and enhance the discrimination,RS2ACF explicitly uses class information of labeled data and more importantly jointly learns an explicit label indicator for unlabeled data and then further estimate the label information for unlabeled data.Besides,by incorporating the joint neighborhood reconstruction error over the new representations and predicted labels of both labeled and unlabeled data,the manifold structures can be preserved in representation space and label space at the same time.Owing to the adaptive manner,the tough problem of determining the neighborhood parameter can be effectively avoided.(3)In order to solve the problem that the traditional CF methods cannot uncover deep features,we propose a Deep Self-representative Concept Factorization Network(DSCF-Net).To improve the data representation and clustering power,the model integrates robust deep concept factorization,deep self-expressive representation and adaptive locality preserving feature learning into a unified framework.To uncover hidden deep features,the model designs a hierarchical factorization architecture by multi-layer of linear transformations.In each layer,the data representation is improved indirectly by optimizing the bases that can capture high-dimensional information.In order to improve the robustness of feature against sparse noise,DSCF-Net improves the robustness by subspace recovery for sparse error correction firstly and then performs the deep factorization in the recovered clean subspace.To obtain locality-preserving representations,the model also presents an adaptive deep selfrepresentative weighting strategy by using the coefficients as the adaptive reconstruction weights to keep the locality of representations simultaneously.(4)For the limited representation power of existing multi-layer matrix factorization methods caused by unreasonable factorization structures,we propose a dual-constrained Deep Semi-Supervised Coupled Factorization Network(DS2CF-Net),which integrates prior knowledge enrichment,self-expressive discriminating representation,and joint label and structure constraints into the deep semi-supervised coupled factorization framework.Deep coupled factorization strategy is designed to coupled optimize the basis vectors and representations in each layer.Error correction mechanism and feature fusion strategy are introduced,and clustering evaluation modules are added between layers to prevent that the performance declines with the increase of layers.Label prediction is added into the model to enrich prior knowledge,and joint structure and label constraints are used to improve the discrimination of features.In addition,adaptive dual-graph learning ensures that the model can retain local geometric structure information in both data space and feature space.

Keywords/Search Tags:

Feature extraction, concept factorization, deep factorization models, unsupervised learning, semi-supervised learning, clustering analysis

PDF Full Text Request

Related items

1	Semi-supervised Low-rank Matrix Learning And Its Applications
2	Nonnegative Matrix Factorization Algorithm Based On The Regularized Method And Its Applications
3	Manifold Learning And Semi-supervised Learning With Applications To Feature Extraction
4	Research On Methods For Unsupervised Feature Learning
5	Semi-supervised Learning On Text Data
6	Semi-supervised Non-negative Matrix Factorization And Its Application In Document Clustering
7	Research On General Non-negative Matrix Factorization Based On Semi-supervised
8	Research On Image Recognition Based On Semi-supervised Non-negative Matrix Factorization
9	Non-negative Local Coordinate Factorization And Its Applications
10	Research On Manifold Embedding Matrix Factorization Algorithm