Joint T-SNE For Comparable Projections Of Multiple High-Dimensional Datasets

Posted on:2023-10-11

Degree:Master

Type:Thesis

Country:China

Candidate:Y Q Wang

Full Text:PDF

GTID:2568306902485104

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

Analysis for high-dimensional dataset has shown important values both for research and applications in various fields such as recommendation system,social network,and bioinformatics.One of the most important tasks for the analysis is comparing different highdimensional datasets,which helps people to identify their correlations and common patterns.For example,one might compare time series data to discover the evolving pattern,or he might compare the activation tensors of different layers in a deep neural network,for understanding how each layer transforms the internal representations.However,due to the complexity of high-dimensional datasets,it is nearly impossible to compare them directly.By projecting the dataset into lower-dimensional space,one is able to perform comparison tasks to infer the changes happening on it.A simple method would be to project each dataset independently.But due to the stochastic and unpredictable optimization process of many projection techniques,such method often introduces undesirable variations,such as misalignment of identical data points,making the technique unsuitable for comparison tasks.The recent work for comparable projections is Dynamic t-SNE,which introduces an additional loss term for penalizing the movement of all data points in between multiple projections,into the objective function of t-SNE and optimizes the position of all data points across projections at the same time.Although the purpose of maintaining visual consistency is achieved,distortions frequently occur due to its rigid constraints on the absolute position of every single point.And since dynamic t-SNE takes in the full sequence of datasets at once and optimizes all projections as a whole,it introduces a heavy computational burden and poses a significant challenge to the memory,thus not suitable for projecting streaming data.To address these limitations,our main contributions are as follow:(1)We present Joint t-SNE,a novel multi-dimensional projection technique that generates coherent projections of multiple high-dimensional datasets.To this end,we first capture the topological characteristics around each point by employing the Graphlet Frequency Distribution in high dimensional space.We then introduce an extra loss term,vector constraints,that guides the optimization process to preserve edge vectors between projected points across two-data frames.(2)We quantitatively and qualitatively evaluate Joint t-SNE and show that Joint t-SNE is capable of generating projection results that satisfies both consistency and fidelity.Joint t-SNE breaks the global constraint which cannot reflect local changes in high dimensional data,making it easier for comparison tasks.In addition,Joint t-SNE only requires two data frames at each time,substantially lowering the computational cost.(3)We apply GFD-based similarity and vector constraints on other dimensionality reduction techniques,and prove the high extensibility of our method through extensive experiments.

Keywords/Search Tags:

high-dimensional data, projection, embedding, t-stochastic neighbor embedding

PDF Full Text Request

Related items

1	The 3D Shapes Isometric Deformation Based On Stochastic Neighbor Embedding
2	Discriminant Feature Learning Based On Stochastic Neighbor Embedding
3	Variational Auto-Encoder Combined With T-Distributed Stochastic Neighbor Embedding For Dimensionality Reduction And Cluster Analysis
4	Fast Geodesic Distance Query Based On High-dimensional Embedding
5	Several Algorithms Based On Nonlinear Dimensionality Reduction Of Facial Expression Recognition Research
6	The Study Of Correlation Analysis And Dimensionality Reduction Methods And Their Applications
7	Research On Image Super-resolution Algorithms Through Interpolation And Neighbor Embedding
8	Research On Dimension Reduction Methods Of High Dimensional Data
9	Graph Embedding and Nonlinear Dimensionality Reduction
10	Embedding And Visualization For High Dimensional Unit Data