Principal Component Analysis (PCA) Explained

Learn how Principal Component Analysis (PCA) reduces the dimensionality of datasets while preserving important information. Understand the intuition, mathematics, and practical uses of PCA in machine learning and data science.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Dimensionality Reduction in Machine Learning

Revision Cheat Sheet

Principal Component Analysis (PCA)

The most widely used algorithm for dimensionality reduction.

Intuition in 2D → 1D

Suppose we have:

x^{(i)} \in \mathbb{R}^2

and we want to reduce the data from 2 dimensions to 1 dimension.

That means:

We want to find a line
Onto which we project all data points

The key question:

Which line should we choose?

Good Projection Direction

A good projection line is one where:

When we project each point onto the line
The distance between the original point and its projection is small

These distances are called:

Projection errors

The orthogonal distance from a point to the line.

PCA chooses the line that minimizes:

\text{Sum of squared projection errors}

General Case: nD → kD

Now suppose:

x^{(i)} \in \mathbb{R}^n

and we want to reduce to:

z^{(i)} \in \mathbb{R}^k \quad \text{where } k < n

Instead of finding one vector, we find:

u^{(1)}, u^{(2)}, \dots, u^{(k)}

These vectors:

Define a k-dimensional surface
Span a k-dimensional linear subspace

We then project each point onto that subspace.

3D → 2D Example

If:

x^{(i)} \in \mathbb{R}^3

and we reduce to 2D:

We find two vectors:

u^{(1)}, u^{(2)} \in \mathbb{R}^3

These define a plane.
Each point is projected onto that plane.

The projection error is:

\| x^{(i)} - \hat{x}^{(i)} \|^2

where:

$\hat{x}^{(i)}$ is the projected version of $x^{(i)}$

PCA minimizes:

\sum_{i=1}^{m} \| x^{(i)} - \hat{x}^{(i)} \|^2

2D → 1D Example

We want to find a vector:

u^{(1)} \in \mathbb{R}^2

that defines the direction of the line.

PCA solves:

\min_{u^{(1)}} \sum_{i=1}^{m} \| x^{(i)} - \text{projection of } x^{(i)} \text{ onto } u^{(1)} \|^2

So PCA finds the direction that minimizes the total squared orthogonal distance.

Important:

If PCA returns $u^{(1)}$ or $-u^{(1)}$ , it does not matter.
Both define the same line.

Preprocessing Step

Before applying PCA, it is standard to:

Perform mean normalization
Perform feature scaling

So that:

Each feature has zero mean
Features have comparable ranges

This prevents one feature from dominating purely due to scale.

PCA vs Linear Regression (Very Important)

PCA is NOT linear regression.

Linear Regression:

Predicts a special variable $y$
Minimizes vertical squared errors
Error is measured in the y-direction only

PCA:

Has no special target variable
All features $x_1, x_2, \dots, x_n$ are treated equally
Minimizes orthogonal (shortest) distance to a line/plane

Linear regression minimizes:

\text{vertical distance}

PCA minimizes:

\text{orthogonal distance}

These are completely different objectives.

Final Summary

PCA:

Finds a lower-dimensional subspace
Projects data onto that subspace
Minimizes squared orthogonal projection error
Treats all features symmetrically
Is not a predictive model

Formally, PCA solves:

\min \sum_{i=1}^{m} \| x^{(i)} - \hat{x}^{(i)} \|^2

where $\hat{x}^{(i)}$ is the projection of $x^{(i)}$ onto a k-dimensional subspace.

Principal Component Analysis (PCA) Explained

Learn how Principal Component Analysis (PCA) reduces the dimensionality of datasets while preserving important information. Understand the intuition, mathematics, and practical uses of PCA in machine learning and data science.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Dimensionality Reduction in Machine Learning

Revision Cheat Sheet

Principal Component Analysis (PCA)

The most widely used algorithm for dimensionality reduction.

Intuition in 2D → 1D

Suppose we have:

x^{(i)} \in \mathbb{R}^2

and we want to reduce the data from 2 dimensions to 1 dimension.

That means:

We want to find a line
Onto which we project all data points

The key question:

Which line should we choose?

Good Projection Direction

A good projection line is one where:

When we project each point onto the line
The distance between the original point and its projection is small

These distances are called:

Projection errors

The orthogonal distance from a point to the line.

PCA chooses the line that minimizes:

\text{Sum of squared projection errors}

General Case: nD → kD

Now suppose:

x^{(i)} \in \mathbb{R}^n

and we want to reduce to:

z^{(i)} \in \mathbb{R}^k \quad \text{where } k < n

Instead of finding one vector, we find:

u^{(1)}, u^{(2)}, \dots, u^{(k)}

These vectors:

Define a k-dimensional surface
Span a k-dimensional linear subspace

We then project each point onto that subspace.

3D → 2D Example

If:

x^{(i)} \in \mathbb{R}^3

and we reduce to 2D:

We find two vectors:

u^{(1)}, u^{(2)} \in \mathbb{R}^3

These define a plane.
Each point is projected onto that plane.

The projection error is:

\| x^{(i)} - \hat{x}^{(i)} \|^2

where:

$\hat{x}^{(i)}$ is the projected version of $x^{(i)}$

PCA minimizes:

\sum_{i=1}^{m} \| x^{(i)} - \hat{x}^{(i)} \|^2

2D → 1D Example

We want to find a vector:

u^{(1)} \in \mathbb{R}^2

that defines the direction of the line.

PCA solves:

\min_{u^{(1)}} \sum_{i=1}^{m} \| x^{(i)} - \text{projection of } x^{(i)} \text{ onto } u^{(1)} \|^2

So PCA finds the direction that minimizes the total squared orthogonal distance.

Important:

If PCA returns $u^{(1)}$ or $-u^{(1)}$ , it does not matter.
Both define the same line.

Preprocessing Step

Before applying PCA, it is standard to:

Perform mean normalization
Perform feature scaling

So that:

Each feature has zero mean
Features have comparable ranges

This prevents one feature from dominating purely due to scale.

PCA vs Linear Regression (Very Important)

PCA is NOT linear regression.

Linear Regression:

Predicts a special variable $y$
Minimizes vertical squared errors
Error is measured in the y-direction only

PCA:

Has no special target variable
All features $x_1, x_2, \dots, x_n$ are treated equally
Minimizes orthogonal (shortest) distance to a line/plane

Linear regression minimizes:

\text{vertical distance}

PCA minimizes:

\text{orthogonal distance}

These are completely different objectives.

Final Summary

PCA:

Finds a lower-dimensional subspace
Projects data onto that subspace
Minimizes squared orthogonal projection error
Treats all features symmetrically
Is not a predictive model

Formally, PCA solves:

\min \sum_{i=1}^{m} \| x^{(i)} - \hat{x}^{(i)} \|^2

where $\hat{x}^{(i)}$ is the projection of $x^{(i)}$ onto a k-dimensional subspace.

Principal Component Analysis (PCA) Explained

Learn how Principal Component Analysis (PCA) reduces the dimensionality of datasets while preserving important information. Understand the intuition, mathematics, and practical uses of PCA in machine learning and data science.

Written by Hitesh Sahu, a passionate developer and blogger.

Principal Component Analysis (PCA)

Intuition in 2D → 1D

Good Projection Direction

Projection errors

General Case: nD → kD

3D → 2D Example

2D → 1D Example

Preprocessing Step

PCA vs Linear Regression (Very Important)

Linear Regression:

PCA:

Final Summary

Playstore

Fetching content, this won’t take long…

🐙 Octopuses have three hearts and blue blood.

Principal Component Analysis (PCA) Explained

Learn how Principal Component Analysis (PCA) reduces the dimensionality of datasets while preserving important information. Understand the intuition, mathematics, and practical uses of PCA in machine learning and data science.

Written by Hitesh Sahu, a passionate developer and blogger.

Principal Component Analysis (PCA)

Intuition in 2D → 1D

Good Projection Direction

Projection errors

General Case: nD → kD

3D → 2D Example

2D → 1D Example

Preprocessing Step

PCA vs Linear Regression (Very Important)

Linear Regression:

PCA:

Final Summary

Playstore