Hitesh Sahu
Hitesh SahuHitesh Sahu
  1. Home
  2. ›
  3. posts
  4. ›
  5. …

  6. ›
  7. 5 2 t SNE

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🦥 Sloths can hold their breath longer than dolphins 🐬.

🍪 This website uses cookies

No personal data is stored on our servers however third party tools Google Analytics cookies to measure traffic and improve your website experience. Learn more

AI-Machine-Learning

  • AI-Machine-Learning Index

  • Machine Learning Learning Path

  • Machine Learning: Introduction and Core Algorithms

  • Linear Regression Explained: Single Variable and Multivariate Models with Gradient Descent

  • Evaluating a Hypothesis in Neural Networks

  • Bias-Variance Dilemma

  • Cost Function Regularization: Balancing Bias and Variance in Machine Learning Models

  • Polynomial Regression

  • Normal Equation in Linear Regression: Formula, Intuition, and Comparison with Gradient Descent

  • Logistic Regression for Classification: Concept, Sigmoid Function, Cost Function, and Implementation

  • Logistic Regression for Classification: Concept, Sigmoid Function, Cost Function, and Implementation

  • Support Vector Machines (SVM): Maximizing Margins for Robust Machine Learning Models

  • XGBoost (Extreme Gradient Boosting) Explained

  • Dimensionality Reduction in Machine Learning

  • Principal Component Analysis (PCA) Explained

  • t-SNE (t-distributed Stochastic Neighbor Embedding) Explained

  • K-Means Clustering

  • Anomaly Detection: Identifying Rare and Unusual Patterns in Data

  • Anomaly Detection Using Gaussian Distribution in Machine Learning

  • Anomaly Detection Using Multivariate Gaussian Distribution

  • Recommender Systems: Collaborative Filtering, Content-Based Filtering, and Hybrid Approaches

  • Collaborative Filtering: Building Recommender Systems with Feature Learning

  • Anomaly Detection: Identifying Rare and Unusual Patterns in Data

  • Large Scale Machine Learning: Training Models on Massive Datasets

  • Stochastic Gradient Descent (SGD): Efficient Optimization for Large Datasets

  • MapReduce for Large-Scale Machine Learning: Distributed Training at Scale

Cover Image for t-SNE (t-distributed Stochastic Neighbor Embedding) Explained

t-SNE (t-distributed Stochastic Neighbor Embedding) Explained

Learn how t-SNE works for dimensionality reduction and data visualization, including high-dimensional embeddings, neighborhood preservation, probability distributions, KL divergence, and clustering visualization.

Hitesh Sahu
Written by Hitesh Sahu, a passionate developer and blogger.

Tue May 26 2026

Share This on

← Previous

XGBoost (Extreme Gradient Boosting) Explained

Next →

RAPIDS and GPU Accelerated Data Science: cuDF, cuML, CUDA, NCCL and Distributed AI Pipelines

t-SNE (t-distributed Stochastic Neighbor Embedding)

t-SNE is a nonlinear dimensionality reduction algorithm that converts high-dimensional data into 2D or 3D visualizations while preserving local similarity structure.

t-SNE is a dimensionality reduction technique used to visualize high-dimensional data in lower dimensions, typically:

  • 2D
  • 3D

It is widely used for:

  • clustering visualization
  • embedding visualization
  • feature space analysis
  • latent space exploration

Core Idea

t-SNE preserves:

  • local structure
  • neighborhood similarity

Points that are close in high-dimensional space remain close in lower-dimensional space.

t-SNE Visualization Example

flowchart LR

    A1[Cat Images]
    A2[Dog Images]
    A3[Car Images]

    A1 --> B[t-SNE Projection]
    A2 --> B
    A3 --> B

    B --> C[Clustered 2D Visualization]

Why t-SNE is Needed

High-dimensional data is difficult to visualize directly.

Examples:

  • Word embeddings
  • Image embeddings
  • Transformer hidden states
  • Feature vectors

t-SNE converts:

High-Dimensional Space→Low-Dimensional Visualization\text{High-Dimensional Space} \rightarrow \text{Low-Dimensional Visualization}High-Dimensional Space→Low-Dimensional Visualization

t-SNE uses heavy-tailed distribution to:

  • avoid crowding problem
  • separate distant clusters better

Applications of t-SNE

  • NLP Embeddings
  • Image Feature Visualization
  • Transformer Embeddings
  • Clustering Analysis
  • Latent Space Visualization
  • Anomaly Detection

High-Level Workflow

flowchart TD

    A[High-Dimensional Data] --> B[Compute Pairwise Similarities]

    B --> C[Convert to Probability Distribution]

    C --> D[t-SNE Optimization]

    D --> E[2D or 3D Embedding]

    E --> F[Visualization]

Example

Suppose each image is represented by:

xi∈R512x_i \in \mathbb{R}^{512}xi​∈R512

t-SNE reduces it into:

yi∈R2y_i \in \mathbb{R}^{2}yi​∈R2

for visualization.


Step 1: Similarity in High-Dimensional Space

t-SNE computes probability similarity between points.

Probability that point xjx_jxj​ is neighbor of xix_ixi​:

pj∣i=exp⁡(−∣∣xi−xj∣∣2/2σi2)∑k≠iexp⁡(−∣∣xi−xk∣∣2/2σi2)p_{j|i} = \frac{ \exp(-||x_i - x_j||^2 / 2\sigma_i^2) }{ \sum_{k \ne i} \exp(-||x_i - x_k||^2 / 2\sigma_i^2) }pj∣i​=∑k=i​exp(−∣∣xi​−xk​∣∣2/2σi2​)exp(−∣∣xi​−xj​∣∣2/2σi2​)​

Where:

  • xix_ixi​ = data point
  • σi\sigma_iσi​ = variance parameter

Step 2: Similarity in Low-Dimensional Space

Low-dimensional similarity uses Student t-distribution.

qij=(1+∣∣yi−yj∣∣2)−1∑k≠l(1+∣∣yk−yl∣∣2)−1q_{ij} = \frac{ (1 + ||y_i - y_j||^2)^{-1} }{ \sum_{k \ne l} (1 + ||y_k - y_l||^2)^{-1} }qij​=∑k=l​(1+∣∣yk​−yl​∣∣2)−1(1+∣∣yi​−yj​∣∣2)−1​

Where:

  • yiy_iyi​ = low-dimensional embedding

Optimization Objective

t-SNE minimizes divergence between:

  • high-dimensional similarity
  • low-dimensional similarity

Using KL Divergence:

KL(P∣∣Q)=∑i≠jpijlog⁡pijqijKL(P || Q) = \sum_{i \ne j} p_{ij} \log \frac{p_{ij}}{q_{ij}}KL(P∣∣Q)=i=j∑​pij​logqij​pij​​

Optimization Flow

flowchart TD

    A[High-Dimensional Similarities P] --> C[KL Divergence Loss]

    B[Low-Dimensional Similarities Q] --> C

    C --> D[Gradient Descent]

    D --> E[Updated Embeddings]

Important Hyperparameters

Parameter Purpose
Perplexity Controls neighborhood size
Learning Rate Optimization step size
Iterations Number of optimization steps
Dimensions Output dimension (2D/3D)

Perplexity

Perplexity balances:

  • local structure
  • global structure

Typical values:

5≤Perplexity≤505 \le \text{Perplexity} \le 505≤Perplexity≤50

Advantages

  • Excellent visualization quality
  • Preserves local neighborhoods
  • Works well with embeddings
  • Reveals hidden clusters

Limitations

Limitation Description
Computationally expensive Slow on large datasets
Not deterministic Different runs vary
Poor global distance preservation Far clusters may distort
Primarily visualization tool Not ideal for downstream ML

t-SNE vs PCA

PCA t-SNE
Linear reduction Nonlinear reduction
Fast Slower
Preserves variance Preserves neighborhoods
Good for preprocessing Good for visualization

PCA + t-SNE Pipeline

Common workflow:

flowchart TD

    A[High-Dimensional Data]

    A --> B[PCA Reduction]

    B --> C[t-SNE]

    C --> D[2D Visualization]

PCA first reduces noise and dimensionality before applying t-SNE.

t-SNE vs UMAP

t-SNE UMAP
Better local structure Better global structure
Slower Faster
More computationally expensive More scalable
Widely used historically Increasingly popular

← Previous

XGBoost (Extreme Gradient Boosting) Explained

Next →

RAPIDS and GPU Accelerated Data Science: cuDF, cuML, CUDA, NCCL and Distributed AI Pipelines

AI-Machine-Learning/5-2-t-SNE
Let's work together
+49 176-2019-2523
hiteshkrsahu@gmail.com
WhatsApp
Skype
Munich 🥨, Germany 🇩🇪, EU
Playstore
Hitesh Sahu's apps on Google Play Store
Need Help?
Let's Connect
Navigation
  Home/About
  Skills
  Work/Projects
  Lab/Experiments
  Contribution
  Awards
  Art/Sketches
  Thoughts
  Contact
Links
  Sitemap
  Legal Notice
  Privacy Policy

Made with

NextJS logo

NextJS by

hitesh Sahu

| © 2026 All rights reserved.