Anomaly Detection Using Gaussian Distribution: Detecting Outliers with Probability Models
Learn how anomaly detection works using the Gaussian (normal) distribution. Understand how to model data probabilistically, estimate parameters, compute likelihoods, and identify outliers using threshold-based decision making in machine learning systems.
Anomaly Detection: Identifying Rare and Unusual Patterns in Data
Recommender Systems: Collaborative Filtering, Content-Based Filtering, and Hybrid Approaches
Gaussian Distribution/ Normal distribution
This understanding of the Gaussian distribution is essential before building the anomaly detection algorithm.
This means the random variable follows a Gaussian distribution with:
- Mean:
- Variance:
The symbol means “is distributed as”.
Probability Density Function
The Gaussian probability density function is:
- You do not need to memorize this formula.
- It simply defines the bell-shaped curve.
Shape of the Gaussian Distribution
The Gaussian curve has the following properties:
- It is bell-shaped
- It is symmetric around the mean
- The total area under the curve equals 1
Effect of Parameters
The curve is fully defined by two parameters. SO our goal is to estimate:
1. Mean (μ)
- Controls the center of the distribution.
- Changing μ shifts the curve left or right.
Example:
- If μ = 0 → centered at 0
- If μ = 3 → centered at 3
Estimating the Mean
This is simply the average of the data.
2. Standard Deviation (σ)
This measures how far the data points are from the mean.
It is the average squared deviation from the mean.
- Controls the width of the curve.
- Smaller σ → narrower and taller curve
- Larger σ → wider and flatter curve
Since total area must equal 1:
- If the curve gets wider → it becomes shorter
- If the curve gets narrower → it becomes taller
Estimating the Variance
Note on 1/m vs 1/(m−1)
In statistics, you may sometimes see:
instead of:
In machine learning, we usually use 1/m. When the dataset is large, the difference is very small in practice.
Examples
Case 1: μ = 0, σ = 1
This is the standard normal distribution.
- Centered at 0
- Moderate width
Case 2: μ = 0, σ = 0.5
- Still centered at 0
- Much narrower
- Taller curve
- Variance =
Case 3: μ = 0, σ = 2
- Centered at 0
- Much wider
- Flatter curve
Anomaly Detection Using Gaussian Distribution
In this approach, we build an anomaly detection algorithm by modeling the probability of data using Gaussian distributions.
- Model each feature using a Gaussian distribution.
- Estimate and from training data.
- Compute for new examples.
- Flag examples where:
Low probability → likely anomaly.
Problem Setup
We are given:
-
An unlabeled training set of examples:
-
Each example is a feature vector in
Examples:
- Aircraft engine sensor data
- User behavior features
- System monitoring metrics
The goal is to determine whether a new example is normal or anomalous.
📚 1. Training Phase
- Choose relevant features.
- Compute .
- Compute .
Modeling
We model the probability of a data point using Gaussian Distribution:
where is the number of features.
Each feature is modeled using a Gaussian distribution:
The symbol means “is distributed as”.
This means the random variable follows a Gaussian distribution with:
- Mean
- Variance
Parameter Estimation
Given training data, we estimate parameters.
Mean
For each feature :
This is the average value of feature .
Variance
This measures how spread out the feature values are.
Compute Probability of Examples
We assume the features are independent, so:
Here, denotes a product (multiplication over a range).
Where each feature probability is:
This approach is called density estimation.
🔎 2. Making Predictions
Detection Phase
For a new example :
Step 1: Compute probability
Step 2: Choose threshold
Decision Rule
Compare with .
Flag as anomaly if probability is low.
Intuition (2D Case)
When :
- Each feature has its own Gaussian distribution.
- Their product forms a 3D probability surface.
- High probability regions form an ellipse-shaped area.
- Points outside that region have low probability and are flagged as anomalies.
