Hitesh Sahu
Hitesh SahuHitesh Sahu
  1. Home
  2. ›
  3. posts
  4. ›
  5. …

  6. ›
  7. 5 2 t SNE

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🦈 Sharks existed before trees 🌳.

🍪 This website uses cookies

No personal data is stored on our servers however third party tools Google Analytics cookies to measure traffic and improve your website experience. Learn more

Cover Image for t-SNE (t-distributed Stochastic Neighbor Embedding) Explained

t-SNE (t-distributed Stochastic Neighbor Embedding) Explained

Learn how t-SNE works for dimensionality reduction and data visualization, including high-dimensional embeddings, neighborhood preservation, probability distributions, KL divergence, and clustering visualization.

Hitesh Sahu
Written by Hitesh Sahu, a passionate developer and blogger.

Tue May 26 2026

Share This on

← Previous

XGBoost (Extreme Gradient Boosting) Explained

Next →

📒 All Blog Posts Index

t-SNE (t-distributed Stochastic Neighbor Embedding)

t-SNE is a nonlinear dimensionality reduction algorithm that converts high-dimensional data into 2D or 3D visualizations while preserving local similarity structure.

t-SNE is a dimensionality reduction technique used to visualize high-dimensional data in lower dimensions, typically:

  • 2D
  • 3D

It is widely used for:

  • clustering visualization
  • embedding visualization
  • feature space analysis
  • latent space exploration

Core Idea

t-SNE preserves:

  • local structure
  • neighborhood similarity

Points that are close in high-dimensional space remain close in lower-dimensional space.

t-SNE Visualization Example

flowchart LR

    A1[Cat Images]
    A2[Dog Images]
    A3[Car Images]

    A1 --> B[t-SNE Projection]
    A2 --> B
    A3 --> B

    B --> C[Clustered 2D Visualization]

Why t-SNE is Needed

High-dimensional data is difficult to visualize directly.

Examples:

  • Word embeddings
  • Image embeddings
  • Transformer hidden states
  • Feature vectors

t-SNE converts:

High-Dimensional Space→Low-Dimensional Visualization\text{High-Dimensional Space} \rightarrow \text{Low-Dimensional Visualization}High-Dimensional Space→Low-Dimensional Visualization

t-SNE uses heavy-tailed distribution to:

  • avoid crowding problem
  • separate distant clusters better

Applications of t-SNE

  • NLP Embeddings
  • Image Feature Visualization
  • Transformer Embeddings
  • Clustering Analysis
  • Latent Space Visualization
  • Anomaly Detection

High-Level Workflow

flowchart TD

    A[High-Dimensional Data] --> B[Compute Pairwise Similarities]

    B --> C[Convert to Probability Distribution]

    C --> D[t-SNE Optimization]

    D --> E[2D or 3D Embedding]

    E --> F[Visualization]

Example

Suppose each image is represented by:

xi∈R512x_i \in \mathbb{R}^{512}xi​∈R512

t-SNE reduces it into:

yi∈R2y_i \in \mathbb{R}^{2}yi​∈R2

for visualization.


Step 1: Similarity in High-Dimensional Space

t-SNE computes probability similarity between points.

Probability that point xjx_jxj​ is neighbor of xix_ixi​:

pj∣i=exp⁡(−∣∣xi−xj∣∣2/2σi2)∑k≠iexp⁡(−∣∣xi−xk∣∣2/2σi2)p_{j|i} = \frac{ \exp(-||x_i - x_j||^2 / 2\sigma_i^2) }{ \sum_{k \ne i} \exp(-||x_i - x_k||^2 / 2\sigma_i^2) }pj∣i​=∑k=i​exp(−∣∣xi​−xk​∣∣2/2σi2​)exp(−∣∣xi​−xj​∣∣2/2σi2​)​

Where:

  • xix_ixi​ = data point
  • σi\sigma_iσi​ = variance parameter

Step 2: Similarity in Low-Dimensional Space

Low-dimensional similarity uses Student t-distribution.

qij=(1+∣∣yi−yj∣∣2)−1∑k≠l(1+∣∣yk−yl∣∣2)−1q_{ij} = \frac{ (1 + ||y_i - y_j||^2)^{-1} }{ \sum_{k \ne l} (1 + ||y_k - y_l||^2)^{-1} }qij​=∑k=l​(1+∣∣yk​−yl​∣∣2)−1(1+∣∣yi​−yj​∣∣2)−1​

Where:

  • yiy_iyi​ = low-dimensional embedding

Optimization Objective

t-SNE minimizes divergence between:

  • high-dimensional similarity
  • low-dimensional similarity

Using KL Divergence:

KL(P∣∣Q)=∑i≠jpijlog⁡pijqijKL(P || Q) = \sum_{i \ne j} p_{ij} \log \frac{p_{ij}}{q_{ij}}KL(P∣∣Q)=i=j∑​pij​logqij​pij​​

Optimization Flow

flowchart TD

    A[High-Dimensional Similarities P] --> C[KL Divergence Loss]

    B[Low-Dimensional Similarities Q] --> C

    C --> D[Gradient Descent]

    D --> E[Updated Embeddings]

Important Hyperparameters

Parameter Purpose
Perplexity Controls neighborhood size
Learning Rate Optimization step size
Iterations Number of optimization steps
Dimensions Output dimension (2D/3D)

Perplexity

Perplexity balances:

  • local structure
  • global structure

Typical values:

5≤Perplexity≤505 \le \text{Perplexity} \le 505≤Perplexity≤50

Advantages

  • Excellent visualization quality
  • Preserves local neighborhoods
  • Works well with embeddings
  • Reveals hidden clusters

Limitations

Limitation Description
Computationally expensive Slow on large datasets
Not deterministic Different runs vary
Poor global distance preservation Far clusters may distort
Primarily visualization tool Not ideal for downstream ML

t-SNE vs PCA

PCA t-SNE
Linear reduction Nonlinear reduction
Fast Slower
Preserves variance Preserves neighborhoods
Good for preprocessing Good for visualization

PCA + t-SNE Pipeline

Common workflow:

flowchart TD

    A[High-Dimensional Data]

    A --> B[PCA Reduction]

    B --> C[t-SNE]

    C --> D[2D Visualization]

PCA first reduces noise and dimensionality before applying t-SNE.

t-SNE vs UMAP

t-SNE UMAP
Better local structure Better global structure
Slower Faster
More computationally expensive More scalable
Widely used historically Increasingly popular

AI-Machine-Learning/5-2-t-SNE
Let's work together
+49 176-2019-2523
hiteshkrsahu@gmail.com
WhatsApp
Skype
Munich 🥨, Germany 🇩🇪, EU
Playstore
Hitesh Sahu's apps on Google Play Store
Need Help?
Let's Connect
Navigation
  Home/About
  Skills
  Work/Projects
  Lab/Experiments
  Contribution
  Awards
  Art/Sketches
  Thoughts
  Contact
Links
  Sitemap
  Legal Notice
  Privacy Policy

Made with

NextJS logo

NextJS by

hitesh Sahu

| © 2026 All rights reserved.