Font Size: a A A

Clustering Algorithm Analysis Based On GAN And VAE

Posted on:2022-08-01Degree:MasterType:Thesis
Country:ChinaCandidate:M M WangFull Text:PDF
GTID:2518306605966469Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Clustering algorithm refers to the process of dividing similar data into the same cluster and dividing different data into different clusters according to the relevance of the data.As computers penetrate into people’s daily lives,massive amounts of data are produced on the Internet every day,but most of these data do not contain tags.Manual labeling of these massive data is very time-consuming and laborious,so the supervised learning algorithms are facing the problem of lack of training data.Therefore,using the unsupervised clustering algorithm to automatically group them based on the similarity between samples has very important research significance.Traditional clustering algorithms can only extract the shallow features of the data,and cannot effectively mine the deeper nonlinear features hidden in the data,and the computational complexity increases sharply with the increase of the data scale.In contrast,deep clustering algorithms extract more abstract features of data through deep neural networks,reducing computational complexity and improving clustering accuracy.The deep clustering algorithm mainly uses the Auto Encoder model,and a small amount of research work uses the Generative Adversarial Network.The Auto Encoder model maps the data domain and the hidden space one-to-one,so the generalization ability is weak.If only a small sample of data is used to train the model,the clustering accuracy of the overall data is not good.The Generative Adversarial Network has stronger generalization ability than the Auto Encoder.When the training data sample is small or there is a small amount of noise data disturbance,the performance of the Generative Adversarial Network is more stable than the Auto Encoder.However,there is little research work on clustering using Generative Adversarial Network,and it is still in its infancy.Therefore,this paper aims at the above problems and studies the clustering algorithm based on Generative Adversarial Network and Variational Auto Encoder.The main contents are as follows:1.Aiming at the problem of image data clustering,the Info-Cluster-GAN model is proposed.On the basis of the original adversarial between the generator and the discriminator of the original Generative Adversarial Network,a classifier constructed by a deep neural network is added.The classifier maps the data generated using different hidden space variables to different categories.This operation maximizes the mutual information between the generated data and the hidden variables.Thus,low-dimensional hidden space variables can be used to represent high-dimensional data categories.After initially implementing the clustering function of the generated data,the Expectation Maximization algorithm is used to update the classifier parameters to improve the model’s clustering accuracy of real samples.In addition,this model uses Wasserstein distance instead of relative entropy to measure the difference between the generated sample and the real sample,which improves the convergence of the model.In order to optimize the training speed of the model,this model uses two methods.One is to use multiple loss functions instead of training each neural network alternately.The second is to share the network parameters between the discriminator and the classifier.This method reduces network complexity and saves model training time.Experiments show that the clustering accuracy of this model is further improved compared to the existing clustering model based on generative adversarial networks.2.Aiming at the problem of mode collapse in data generated using a single Gaussian distribution noise,a Denoising Variational Auto Encoder is used to generate a mixed Gaussian noise distribution.Specifically,latent variables are used as training data to train the Denoising Variational Auto Encoder.When generating different types of data,the encoder and latent variables are used to generate mixed Gaussian noise signals with different distributions.Sampled signals in different Gaussian distributions are used as generator inputs to avoid model collapse caused by a single noise input.A clustering experiment with noisy data was performed on the optimized model,which proved that the model has relatively good anti-noise and anti-interference ability,and the clustering accuracy is improved compared with the model before optimization.
Keywords/Search Tags:Deep Learning, Generative Adversarial Learning, Variational Auto-encoder, Clustering
PDF Full Text Request
Related items