Dimensionality Reduction in Machine Learning
Learn how dimensionality reduction simplifies high-dimensional data while preserving important patterns. Explore techniques like PCA and understand how reducing features improves model performance, visualization, and computational efficiency.
Dimensionality Reduction
Dimensionality reduction is a type of unsupervised learning.
The idea is simple:
Take high-dimensional data and represent it using fewer dimensions while preserving as much important structure as possible.
- This is an approximation.
- We lose some information because we are projecting.
- But if most of the data lies near a lower-dimensional structure, the loss is small.
Usually we start with:
And reduce to:
Example:
- 1000D → 100D
- 300D → 50D
Dimensionality reduction finds:
- A lower-dimensional subspace (line, plane, etc.)
- That captures most of the variance in the data
Then it projects the data onto that subspace.
Advantages
There are two main reasons:
1. Data Compression
- Store fewer numbers per example
- Reduce memory and disk usage
2. Faster Learning
- Many algorithms scale with number of features
- Fewer features → faster training and prediction
What Projection Means
Suppose original examples are:
After projection onto a line:
So instead of storing:
We store:
One number instead of two.
Example 1: Redundant Features (2D → 1D)
Suppose:
- = length in centimeters
- = length in inches
These features are highly correlated.
Instead of storing:
We can project the data onto a line and represent each example with a single number:
So:
- Original representation: 2 numbers per example
- Reduced representation: 1 number per example
We approximate the original data by projecting onto a line that captures the main direction of variation.
Example 2: 3D → 2D
Now suppose:
But the data roughly lies on a plane.
Instead of keeping 3 coordinates:
We project onto a 2D plane and represent each example as:
Now we only need two numbers instead of three.
Summary
Dimensionality reduction:
- Removes redundancy
- Compresses data
- Speeds up learning
- Represents high-dimensional data using fewer variables
In the next step, we usually use Principal Component Analysis (PCA) to compute the optimal projection direction mathematically.
Example: Countries Dataset
Suppose we collect a large dataset containing statistics about countries around the world.
Each country may have 50 features, such as:
- = GDP (Gross Domestic Product)
- = GDP per capita
- = Human Development Index
- = Life expectancy
Each country is represented as:
So every country corresponds to a 50-dimensional feature vector.
Reducing 50D to 2D
Using dimensionality reduction, we can transform each country:
Now each country is represented by only two numbers:
This allows us to plot every country as a point in 2D space.
