Dimensionality Reduction in Machine Learning
Learn how dimensionality reduction simplifies high-dimensional data while preserving important patterns. Explore techniques like PCA and understand how reducing features improves model performance, visualization, and computational efficiency.
Gradient Checking and Random Initialization
Principal Component Analysis (PCA) Explained
Dimensionality Reduction
Dimensionality reduction is a type of unsupervised learning.
The idea is simple:
Take high-dimensional data and represent it using fewer dimensions while preserving as much important structure as possible.
- This is an approximation.
- We lose some information because we are projecting.
- But if most of the data lies near a lower-dimensional structure, the loss is small.
Usually we start with:
And reduce to:
Example:
- 1000D → 100D
- 300D → 50D
Dimensionality reduction finds:
- A lower-dimensional subspace (line, plane, etc.)
- That captures most of the variance in the data
Then it projects the data onto that subspace.
Advantages
There are two main reasons:
1. Data Compression
- Store fewer numbers per example
- Reduce memory and disk usage
2. Faster Learning
- Many algorithms scale with number of features
- Fewer features → faster training and prediction
What Projection Means
Suppose original examples are:
After projection onto a line:
So instead of storing:
We store:
One number instead of two.
Example 1: Redundant Features (2D → 1D)
Suppose:
- = length in centimeters
- = length in inches
These features are highly correlated.
Instead of storing:
We can project the data onto a line and represent each example with a single number:
So:
- Original representation: 2 numbers per example
- Reduced representation: 1 number per example
We approximate the original data by projecting onto a line that captures the main direction of variation.
Example 2: 3D → 2D
Now suppose:
But the data roughly lies on a plane.
Instead of keeping 3 coordinates:
We project onto a 2D plane and represent each example as:
Now we only need two numbers instead of three.
Summary
Dimensionality reduction:
- Removes redundancy
- Compresses data
- Speeds up learning
- Represents high-dimensional data using fewer variables
In the next step, we usually use Principal Component Analysis (PCA) to compute the optimal projection direction mathematically.
Example: Countries Dataset
Suppose we collect a large dataset containing statistics about countries around the world.
Each country may have 50 features, such as:
- = GDP (Gross Domestic Product)
- = GDP per capita
- = Human Development Index
- = Life expectancy
Each country is represented as:
So every country corresponds to a 50-dimensional feature vector.
Reducing 50D to 2D
Using dimensionality reduction, we can transform each country:
Now each country is represented by only two numbers:
This allows us to plot every country as a point in 2D space.
