Dimensionality Reduction in Machine Learning

Learn how dimensionality reduction simplifies high-dimensional data while preserving important patterns. Explore techniques like PCA and understand how reducing features improves model performance, visualization, and computational efficiency.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Gradient Checking and Random Initialization

Principal Component Analysis (PCA) Explained

Dimensionality Reduction

Dimensionality reduction is a type of unsupervised learning.

The idea is simple:

Take high-dimensional data and represent it using fewer dimensions while preserving as much important structure as possible.

This is an approximation.
We lose some information because we are projecting.
But if most of the data lies near a lower-dimensional structure, the loss is small.

Usually we start with:

x^{(i)} \in \mathbb{R}^n

And reduce to:

z^{(i)} \in \mathbb{R}^k \quad \text{where } k < n

Example:

1000D → 100D
300D → 50D

Dimensionality reduction finds:

A lower-dimensional subspace (line, plane, etc.)
That captures most of the variance in the data

Then it projects the data onto that subspace.

Advantages

There are two main reasons:

1. Data Compression

Store fewer numbers per example
Reduce memory and disk usage

2. Faster Learning

Many algorithms scale with number of features
Fewer features → faster training and prediction

What Projection Means

Suppose original examples are:

x^{(i)} \in \mathbb{R}^2

After projection onto a line:

z^{(i)} \in \mathbb{R}

So instead of storing:

x^{(i)} = \begin{bmatrix} x_1^{(i)} \\ x_2^{(i)} \end{bmatrix}

We store:

z^{(i)}

One number instead of two.

Example 1: Redundant Features (2D → 1D)

Suppose:

$x_1$ = length in centimeters
$x_2$ = length in inches

These features are highly correlated.

Instead of storing:

x = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}

We can project the data onto a line and represent each example with a single number:

z_1

So:

Original representation: 2 numbers per example
Reduced representation: 1 number per example

We approximate the original data by projecting onto a line that captures the main direction of variation.

Example 2: 3D → 2D

Now suppose:

x^{(i)} \in \mathbb{R}^3

But the data roughly lies on a plane.

Instead of keeping 3 coordinates:

x^{(i)} = \begin{bmatrix} x_1^{(i)} \\ x_2^{(i)} \\ x_3^{(i)} \end{bmatrix}

We project onto a 2D plane and represent each example as:

z^{(i)} = \begin{bmatrix} z_1^{(i)} \\ z_2^{(i)} \end{bmatrix} \in \mathbb{R}^2

Now we only need two numbers instead of three.

Summary

Dimensionality reduction:

Removes redundancy
Compresses data
Speeds up learning
Represents high-dimensional data using fewer variables

In the next step, we usually use Principal Component Analysis (PCA) to compute the optimal projection direction mathematically.

Example: Countries Dataset

Suppose we collect a large dataset containing statistics about countries around the world.

Each country may have 50 features, such as:

$x_1$ = GDP (Gross Domestic Product)
$x_2$ = GDP per capita
$x_3$ = Human Development Index
$x_4$ = Life expectancy
$x_5, x_6, \dots$

Each country is represented as:

x^{(i)} \in \mathbb{R}^{50}

So every country corresponds to a 50-dimensional feature vector.

Reducing 50D to 2D

Using dimensionality reduction, we can transform each country:

x^{(i)} \in \mathbb{R}^{50} \quad \longrightarrow \quad z^{(i)} \in \mathbb{R}^{2}

Now each country is represented by only two numbers:

z^{(i)} = \begin{bmatrix} z_1^{(i)} \\ z_2^{(i)} \end{bmatrix}

This allows us to plot every country as a point in 2D space.

Dimensionality Reduction in Machine Learning

Learn how dimensionality reduction simplifies high-dimensional data while preserving important patterns. Explore techniques like PCA and understand how reducing features improves model performance, visualization, and computational efficiency.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Gradient Checking and Random Initialization

Principal Component Analysis (PCA) Explained

Dimensionality Reduction

Dimensionality reduction is a type of unsupervised learning.

The idea is simple:

Take high-dimensional data and represent it using fewer dimensions while preserving as much important structure as possible.

This is an approximation.
We lose some information because we are projecting.
But if most of the data lies near a lower-dimensional structure, the loss is small.

Usually we start with:

x^{(i)} \in \mathbb{R}^n

And reduce to:

z^{(i)} \in \mathbb{R}^k \quad \text{where } k < n

Example:

1000D → 100D
300D → 50D

Dimensionality reduction finds:

A lower-dimensional subspace (line, plane, etc.)
That captures most of the variance in the data

Then it projects the data onto that subspace.

Advantages

There are two main reasons:

1. Data Compression

Store fewer numbers per example
Reduce memory and disk usage

2. Faster Learning

Many algorithms scale with number of features
Fewer features → faster training and prediction

What Projection Means

Suppose original examples are:

x^{(i)} \in \mathbb{R}^2

After projection onto a line:

z^{(i)} \in \mathbb{R}

So instead of storing:

x^{(i)} = \begin{bmatrix} x_1^{(i)} \\ x_2^{(i)} \end{bmatrix}

We store:

z^{(i)}

One number instead of two.

Example 1: Redundant Features (2D → 1D)

Suppose:

$x_1$ = length in centimeters
$x_2$ = length in inches

These features are highly correlated.

Instead of storing:

x = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}

We can project the data onto a line and represent each example with a single number:

z_1

So:

Original representation: 2 numbers per example
Reduced representation: 1 number per example

We approximate the original data by projecting onto a line that captures the main direction of variation.

Example 2: 3D → 2D

Now suppose:

x^{(i)} \in \mathbb{R}^3

But the data roughly lies on a plane.

Instead of keeping 3 coordinates:

x^{(i)} = \begin{bmatrix} x_1^{(i)} \\ x_2^{(i)} \\ x_3^{(i)} \end{bmatrix}

We project onto a 2D plane and represent each example as:

z^{(i)} = \begin{bmatrix} z_1^{(i)} \\ z_2^{(i)} \end{bmatrix} \in \mathbb{R}^2

Now we only need two numbers instead of three.

Summary

Dimensionality reduction:

Removes redundancy
Compresses data
Speeds up learning
Represents high-dimensional data using fewer variables

In the next step, we usually use Principal Component Analysis (PCA) to compute the optimal projection direction mathematically.

Example: Countries Dataset

Suppose we collect a large dataset containing statistics about countries around the world.

Each country may have 50 features, such as:

$x_1$ = GDP (Gross Domestic Product)
$x_2$ = GDP per capita
$x_3$ = Human Development Index
$x_4$ = Life expectancy
$x_5, x_6, \dots$

Each country is represented as:

x^{(i)} \in \mathbb{R}^{50}

So every country corresponds to a 50-dimensional feature vector.

Reducing 50D to 2D

Using dimensionality reduction, we can transform each country:

x^{(i)} \in \mathbb{R}^{50} \quad \longrightarrow \quad z^{(i)} \in \mathbb{R}^{2}

Now each country is represented by only two numbers:

z^{(i)} = \begin{bmatrix} z_1^{(i)} \\ z_2^{(i)} \end{bmatrix}

This allows us to plot every country as a point in 2D space.

Dimensionality Reduction in Machine Learning

Learn how dimensionality reduction simplifies high-dimensional data while preserving important patterns. Explore techniques like PCA and understand how reducing features improves model performance, visualization, and computational efficiency.

Written by Hitesh Sahu, a passionate developer and blogger.

Dimensionality Reduction

Advantages

1. Data Compression

2. Faster Learning

What Projection Means

Example 1: Redundant Features (2D → 1D)

Example 2: 3D → 2D

Summary

Example: Countries Dataset

Reducing 50D to 2D

Playstore

Fetching content, this won’t take long…

🦥 Sloths can hold their breath longer than dolphins 🐬.

Dimensionality Reduction in Machine Learning

Learn how dimensionality reduction simplifies high-dimensional data while preserving important patterns. Explore techniques like PCA and understand how reducing features improves model performance, visualization, and computational efficiency.

Written by Hitesh Sahu, a passionate developer and blogger.

Dimensionality Reduction

Advantages

1. Data Compression

2. Faster Learning

What Projection Means

Example 1: Redundant Features (2D → 1D)

Example 2: 3D → 2D

Summary

Example: Countries Dataset

Reducing 50D to 2D

Playstore