Font Size: a A A

Research On Clustering Algorithm Based On Structural Feature Representation And Fusion

Posted on:2024-07-25Degree:MasterType:Thesis
Country:ChinaCandidate:G Y WeiFull Text:PDF
GTID:2568307076468734Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The rapid development of Internet,multimedia,mobile terminals and other technologies can acquire and collect various kinds of massive data in real time,and how to mine knowledge from them becomes an important issue.Machine learning has received more and more attention from domestic and foreign scholars by discovering data structures and statistical relationships,driving the construction of models and mining potential information.Clustering algorithm,as an important branch of machine learning,is widely used in image processing,disease diagnosis and information retrieval by analyzing data samples with unknown labels and establishing intrinsic relationships between samples.The traditional clustering algorithm is difficult to explore the inter-sample relationship and structure when dealing with a large amount of unlabeled,inconsistently distributed,and incomplete data information in high-dimensional data,which weakens the robustness and accuracy of the model,and has two problems:(1)the process of obtaining data embedding features only considers the inter-sample relationship,but ignores the sample category distribution;(2)the multi-view data that can be collected has view information and view inter-sample relationship missing.This thesis addresses the above issues and the main research work is described as follows:(1)To address the problem that traditional models only consider the relationship between samples and ignore the distribution of unlabeled samples,we propose a semi-supervised deep embedding clustering algorithm with "pairwise constraints" to achieve semi-supervised clustering of the data set with few labeled samples by solving the optimal embedding.First,the data are mapped to a high-dimensional potential feature space to maximize the class spacing and minimize the intra-category sample spacing to strengthen the category distribution;second,to avoid high feature similarity between samples of different classes,the KL(Kullback-Leibler,KL)scatter is minimized by using the "pairwise constraint" to obtain strong supervised information.Finally,to solve the semi-supervised clustering problem of high-dimensional data,we introduce a self-coding network and construct a semi-supervised clustering model with deep embedding.The model makes full use of the nonlinear feature representation capability of the deep network to improve the clustering accuracy.This thesis is validated and compared on five popular image datasets and four UCI datasets,and the experimental results illustrate the effectiveness and robustness of this algorithm.(2)A new robust multi-view clustering with incomplete information is proposed for the Partially Sample-Missing Problem(PSP)and Partially View-unaligned Problem(PVP),which arise in the process of multi-view clustering model construction.algorithm.The algorithm constructs a twin network architecture,introduces inter-layer feature distillation to generate more discriminative missing information to solve the PVP problem,and strengthens the semantic features of the current view under the supervision of contrast loss constraint by the deep to shallow self-distillation technique.Further,the attention feature fusion layer is used to enable the attention module to automatically learn the similarity behavior of different view features to solve the PVP problem.The algorithm makes full use of the information complementarity and consistency among views to enhance the clustering effect.This paper is validated and compared on six multi-view datasets,and the experimental results illustrate that this algorithm outperforms the comparison algorithm in the performance of multi-view clustering with missing information in all cases.Its code has been open sourced to:https://github.com/LNNU-computer-research-526/SMDC...
Keywords/Search Tags:Semi-supervised clustering, deep auto-encoder network, multi-view clustering, knowledge distillation, attention mechanism
PDF Full Text Request
Related items