Principal Component Analysis (PCA) Explained
Learn how Principal Component Analysis (PCA) reduces the dimensionality of datasets while preserving important information. Understand the intuition, mathematics, and practical uses of PCA in machine learning and data science.
Principal Component Analysis (PCA)
The most widely used algorithm for dimensionality reduction.
Intuition in 2D → 1D
Suppose we have:
and we want to reduce the data from 2 dimensions to 1 dimension.
That means:
- We want to find a line
- Onto which we project all data points
The key question:
Which line should we choose?
Good Projection Direction
A good projection line is one where:
- When we project each point onto the line
- The distance between the original point and its projection is small
These distances are called:
Projection errors
The orthogonal distance from a point to the line.
PCA chooses the line that minimizes:
General Case: nD → kD
Now suppose:
and we want to reduce to:
Instead of finding one vector, we find:
These vectors:
- Define a k-dimensional surface
- Span a k-dimensional linear subspace
We then project each point onto that subspace.
3D → 2D Example
If:
and we reduce to 2D:
- We find two vectors:
- These define a plane.
- Each point is projected onto that plane.
The projection error is:
where:
- is the projected version of
PCA minimizes:
2D → 1D Example
We want to find a vector:
that defines the direction of the line.
PCA solves:
So PCA finds the direction that minimizes the total squared orthogonal distance.
Important:
- If PCA returns or , it does not matter.
- Both define the same line.
Preprocessing Step
Before applying PCA, it is standard to:
- Perform mean normalization
- Perform feature scaling
So that:
- Each feature has zero mean
- Features have comparable ranges
This prevents one feature from dominating purely due to scale.
PCA vs Linear Regression (Very Important)
PCA is NOT linear regression.
Linear Regression:
- Predicts a special variable
- Minimizes vertical squared errors
- Error is measured in the y-direction only
PCA:
- Has no special target variable
- All features are treated equally
- Minimizes orthogonal (shortest) distance to a line/plane
Linear regression minimizes:
PCA minimizes:
These are completely different objectives.
Final Summary
PCA:
- Finds a lower-dimensional subspace
- Projects data onto that subspace
- Minimizes squared orthogonal projection error
- Treats all features symmetrically
- Is not a predictive model
Formally, PCA solves:
where is the projection of onto a k-dimensional subspace.
