Hitesh Sahu
Hitesh SahuHitesh Sahu
  1. Home
  2. ›
  3. posts
  4. ›
  5. …

  6. ›
  7. 11 Gaussian Distribution

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🤯 Your stomach gets a new lining every 3–4 days.

🍪 This website uses cookies

No personal data is stored on our servers however third party tools Google Analytics cookies to measure traffic and improve your website experience. Learn more

Cover Image for Anomaly Detection Using Gaussian Distribution: Detecting Outliers with Probability Models

Anomaly Detection Using Gaussian Distribution: Detecting Outliers with Probability Models

Learn how anomaly detection works using the Gaussian (normal) distribution. Understand how to model data probabilistically, estimate parameters, compute likelihoods, and identify outliers using threshold-based decision making in machine learning systems.

Hitesh Sahu
Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Anomaly Detection: Identifying Rare and Unusual Patterns in Data

Next →

Recommender Systems: Collaborative Filtering, Content-Based Filtering, and Hybrid Approaches

Gaussian Distribution/ Normal distribution

This understanding of the Gaussian distribution is essential before building the anomaly detection algorithm.

x∼N(μ,σ2)x \sim \mathcal{N}(\mu, \sigma^2)x∼N(μ,σ2)

This means the random variable xxx follows a Gaussian distribution with:

  • Mean: μ\muμ
  • Variance: σ2\sigma^2σ2

The symbol ∼\sim∼ means “is distributed as”.

Probability Density Function

The Gaussian probability density function is:

p(x;μ,σ2)=12πσexp⁡(−(x−μ)22σ2)p(x;\mu,\sigma^2) = \frac{1}{\sqrt{2\pi}\sigma} \exp\left( -\frac{(x-\mu)^2}{2\sigma^2} \right)p(x;μ,σ2)=2π​σ1​exp(−2σ2(x−μ)2​)
  • You do not need to memorize this formula.
  • It simply defines the bell-shaped curve.

Shape of the Gaussian Distribution

The Gaussian curve has the following properties:

  • It is bell-shaped
  • It is symmetric around the mean
  • The total area under the curve equals 1
x(1),x(2),...,x(m)x^{(1)}, x^{(2)}, ..., x^{(m)}x(1),x(2),...,x(m)

Effect of Parameters

The curve is fully defined by two parameters. SO our goal is to estimate:

1. Mean (μ)

  • Controls the center of the distribution.
  • Changing μ shifts the curve left or right.

Example:

  • If μ = 0 → centered at 0
  • If μ = 3 → centered at 3

Estimating the Mean

μ=1m∑i=1mx(i)\mu = \frac{1}{m} \sum_{i=1}^{m} x^{(i)}μ=m1​i=1∑m​x(i)

This is simply the average of the data.

2. Standard Deviation (σ)

This measures how far the data points are from the mean.

It is the average squared deviation from the mean.

  • Controls the width of the curve.
  • Smaller σ → narrower and taller curve
  • Larger σ → wider and flatter curve

Since total area must equal 1:

  • If the curve gets wider → it becomes shorter
  • If the curve gets narrower → it becomes taller

Estimating the Variance

σ2=1m∑i=1m(x(i)−μ)2\sigma^2 = \frac{1}{m} \sum_{i=1}^{m} (x^{(i)} - \mu)^2σ2=m1​i=1∑m​(x(i)−μ)2

Note on 1/m vs 1/(m−1)

In statistics, you may sometimes see:

1m−1\frac{1}{m-1}m−11​

instead of:

1m\frac{1}{m}m1​

In machine learning, we usually use 1/m. When the dataset is large, the difference is very small in practice.

Examples

Case 1: μ = 0, σ = 1

This is the standard normal distribution.

  • Centered at 0
  • Moderate width

Case 2: μ = 0, σ = 0.5

  • Still centered at 0
  • Much narrower
  • Taller curve
  • Variance = (σ2=0.25)( \sigma^2 = 0.25) (σ2=0.25)

Case 3: μ = 0, σ = 2

  • Centered at 0
  • Much wider
  • Flatter curve

Anomaly Detection Using Gaussian Distribution

In this approach, we build an anomaly detection algorithm by modeling the probability of data using Gaussian distributions.

  1. Model each feature using a Gaussian distribution.
  2. Estimate μ\muμ and σ2\sigma^2σ2 from training data.
  3. Compute p(x)p(x)p(x) for new examples.
  4. Flag examples where:
p(x)<εp(x) < \varepsilonp(x)<ε

Low probability → likely anomaly.

Problem Setup

We are given:

  • An unlabeled training set of mmm examples:

    x(1),x(2),…,x(m)x^{(1)}, x^{(2)}, \dots, x^{(m)}x(1),x(2),…,x(m)
  • Each example is a feature vector in Rn\mathbb{R}^nRn

Examples:

  • Aircraft engine sensor data
  • User behavior features
  • System monitoring metrics

The goal is to determine whether a new example is normal or anomalous.

📚 1. Training Phase

  1. Choose relevant features.
  2. Compute μ1,…,μn\mu_1, \dots, \mu_nμ1​,…,μn​.
  3. Compute σ12,…,σn2\sigma_1^2, \dots, \sigma_n^2σ12​,…,σn2​.

Modeling p(x)p(x)p(x)

We model the probability of a data point using Gaussian Distribution:

p(x)=p(x1,x2,…,xn)p(x) = p(x_1, x_2, \dots, x_n)p(x)=p(x1​,x2​,…,xn​) x∈Rnx \in \mathbb{R}^nx∈Rn

where nnn is the number of features.

Each feature is modeled using a Gaussian distribution:

xj∼N(μj,σj2)x_j \sim \mathcal{N}(\mu_j, \sigma_j^2)xj​∼N(μj​,σj2​)

The symbol ∼\sim∼ means “is distributed as”.

This means the random variable xxx follows a Gaussian distribution with:

  • Mean μj\mu_jμj​
  • Variance σj2\sigma_j^2σj2​

Parameter Estimation

Given training data, we estimate parameters.

Mean

For each feature jjj:

μj=1m∑i=1mxj(i)\mu_j = \frac{1}{m} \sum_{i=1}^{m} x_j^{(i)}μj​=m1​i=1∑m​xj(i)​

This is the average value of feature jjj.

Variance

σj2=1m∑i=1m(xj(i)−μj)2\sigma_j^2 = \frac{1}{m} \sum_{i=1}^{m} \left(x_j^{(i)} - \mu_j\right)^2σj2​=m1​i=1∑m​(xj(i)​−μj​)2

This measures how spread out the feature values are.

Compute Probability of Examples

We assume the features are independent, so:

p(x)=∏j=1np(xj)p(x) = \prod_{j=1}^{n} p(x_j)p(x)=j=1∏n​p(xj​)

Here, ∏\prod∏ denotes a product (multiplication over a range).

p(x)=∏j=1np(xj;μj,σj2)p(x) = \prod_{j=1}^{n} p(x_j; \mu_j, \sigma_j^2)p(x)=j=1∏n​p(xj​;μj​,σj2​)

Where each feature probability is:

p(xj)=12πσjexp⁡(−(xj−μj)22σj2)p(x_j) = \frac{1}{\sqrt{2\pi}\sigma_j} \exp\left( -\frac{(x_j - \mu_j)^2}{2\sigma_j^2} \right)p(xj​)=2π​σj​1​exp(−2σj2​(xj​−μj​)2​)

This approach is called density estimation.

🔎 2. Making Predictions

Detection Phase

For a new example xtestx_{test}xtest​:

Step 1: Compute probability

p(xtest)=∏j=1np(xtest,j;μj,σj2)p(x_{test}) = \prod_{j=1}^{n} p(x_{test,j}; \mu_j, \sigma_j^2)p(xtest​)=j=1∏n​p(xtest,j​;μj​,σj2​)

Step 2: Choose threshold ε\varepsilonε

Decision Rule

Compare with ε\varepsilonε.

If p(xtest)<ε⇒Anomaly\text{If } p(x_{test}) < \varepsilon \Rightarrow \text{Anomaly}If p(xtest​)<ε⇒Anomaly If p(xtest)≥ε⇒Normal\text{If } p(x_{test}) \ge \varepsilon \Rightarrow \text{Normal}If p(xtest​)≥ε⇒Normal

Flag as anomaly if probability is low.


Intuition (2D Case)

When n=2n = 2n=2:

  • Each feature has its own Gaussian distribution.
  • Their product forms a 3D probability surface.
  • High probability regions form an ellipse-shaped area.
  • Points outside that region have low probability and are flagged as anomalies.
AI-Machine-Learning/11-Gaussian-Distribution
Let's work together
+49 176-2019-2523
hiteshkrsahu@gmail.com
WhatsApp
Skype
Munich 🥨, Germany 🇩🇪, EU
Playstore
Hitesh Sahu's apps on Google Play Store
Need Help?
Let's Connect
Navigation
  Home/About
  Skills
  Work/Projects
  Lab/Experiments
  Contribution
  Awards
  Art/Sketches
  Thoughts
  Contact
Links
  Sitemap
  Legal Notice
  Privacy Policy

Made with

NextJS logo

NextJS by

hitesh Sahu

| © 2026 All rights reserved.