Font Size: a A A

Research And Application Of Multi-view Subspace Clustering Algorithm

Posted on:2023-11-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y X QiaoFull Text:PDF
GTID:2568306818495224Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,single-view data can no longer meet people’s needs,and multi-view data came into being.Multi view data refers to the data that comprehensively describes the same thing from multiple angles or different measurement standards.It is composed of multiple views,and each view represents an angle or measurement standard.Compared with single view,multi-view data contains richer semantics and more meaningful information.Therefore,it is necessary to study multi-view data.As a basic method in unsupervised learning,clustering is outstanding in mining the underlying information of data,which is the method needed to study multi-view data.Therefore,how to effectively cluster multi view data has become a hot spot.Among many methods,multi-view clustering based on subspace has been widely recognized because of its interpretability.Although these methods have achieved good clustering results,there are still some problems,such as reducing the clustering effect by noise points and outliers,ignoring the local structure information of a single view,resulting in information loss and so on.The main research and work of this paper are as follows:(1)Aiming at the problem that the cross view matching algorithm ignores the adverse effects of outliers and noise points on the clustering effect unilaterally believing that the samples in datasets have the same importance,an adaptive sample weighted cross view matching clustering is proposed.Considering the consistency of view geometry and cluster allocation,the proposed algorithm assigns different samples corresponding weights.Specifically,the algorithm assigns the same weight to each sample,and then adjusts it continuously in the subsequent iterative process,and finally reaches the convergence condition.At this time,the weight value of the samples with greater impact is high and the weight value of the samples with less impact is low,which strengthens the positive impact of important samples and weakens the adverse impact of noise points and outliers.Experimental results show that the proposed algorithm has better clustering results than the original algorithm.(2)Aiming at the disadvantage that the constrained bilinear factorization multi-view subspace clustering algorithm only considers the global structure of the view and ignores the local structure information of a single view,constrained bilinear factorization by learning global and local structures for multi-view subspace clustering algorithm is proposed to learn the global structure and local structure at the same time.The algorithm not only considers the consistency and complementarity of views,explores the underlying data distribution and clustering attributes of views,but also comprehensively considers the local structure information of views,effectively captures the internal differences of individual views,and realizes the effective utilization of information.In addition,for the traditional pattern of predefined similarity matrix for each view,the algorithm adopts the adaptive distance regularization method,which can improve the clustering effect.Experimental results on multiple data sets show that the algorithm can improve the effect of multi view clustering.(3)The two algorithms proposed in this paper are applied to the field of text clustering related to information processing and the diversification of media forms.First,collect and preprocess the text data.In this experiment,the document data and the corresponding pictures are collected as two views.The document is preprocessed by word segmentation,vector conversion and dimensionality reduction.The image data is trained by VGG network to generate vectors.Because the dimension of the generated data set is high and the matrix is sparse,the principal component analysis method is used to reduce the dimension to obtain the final data set.Finally,the proposed method is applied to five-year text clustering on the data set.The experimental results were measured by three measurements.The results show that the two algorithms can be applied to the field of text clustering.
Keywords/Search Tags:multi-view, clustering, sample weighting, local structure information, text clustering
PDF Full Text Request
Related items