Hitesh Sahu
Hitesh SahuHitesh Sahu
  1. Home
  2. ›
  3. posts
  4. ›
  5. …

  6. ›
  7. 6 3 Multivariate Gaussian Distribution

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🦈 Sharks existed before trees 🌳.

🍪 This website uses cookies

No personal data is stored on our servers however third party tools Google Analytics cookies to measure traffic and improve your website experience. Learn more

Cover Image for Anomaly Detection Using Multivariate Gaussian Distribution

Anomaly Detection Using Multivariate Gaussian Distribution

Learn anomaly detection using multivariate Gaussian distribution to identify unusual patterns and correlated outliers in datasets. Understand covariance matrices, parameter estimation, probability density functions, and threshold-based anomaly detection techniques used in machine learning systems.

Hitesh Sahu
Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Anomaly Detection Using Gaussian Distribution in Machine Learning

Next →

Recommender Systems: Collaborative Filtering, Content-Based Filtering, and Hybrid Approaches

Multivariate Gaussian Distribution

The multivariate Gaussian distribution is an extension of the normal (Gaussian) distribution to multiple variables.

It is commonly used in:

  • anomaly detection
  • probabilistic modeling
  • Gaussian mixture models
  • Bayesian ML
  • Kalman filters

Motivation

Suppose we are monitoring machines in a data center.

Features:

Feature Meaning
x1x_1x1​ CPU Load
x2x_2x2​ Memory Usage

Most machines behave normally:

High CPU  ↔ High Memory
Low CPU   ↔ Low Memory

The features are correlated.

Problem with Basic Gaussian Anomaly Detection

Earlier anomaly detection assumed:

p(x)=p(x1)p(x2)p(x) = p(x_1)p(x_2)p(x)=p(x1​)p(x2​)

meaning:

  • features are independent

Why This Fails

Suppose we observe:

CPU Load   = Very Low
Memory Use = Very High

Individually:

  • CPU value looks normal
  • Memory value looks normal

But together:

  • this combination is strange

Visual Intuition

Normal data may lie along a diagonal trend:

Low CPU  → Low Memory
High CPU → High Memory

An anomalous point:

Low CPU + High Memory

lies far away from the normal pattern.

Basic Gaussian fails because it ignores feature correlation.

Solution: Multivariate Gaussian

Instead of modeling each feature separately:

p(x1),p(x2),...,p(xn)p(x_1), p(x_2), ..., p(x_n)p(x1​),p(x2​),...,p(xn​)

we model:

p(x)p(x)p(x)

all together.

Multivariate Gaussian Formula

The probability density is:

p(x;μ,Σ)=1(2π)n/2∣Σ∣1/2exp⁡(−12(x−μ)TΣ−1(x−μ))p(x;\mu,\Sigma) = \frac{1}{ (2\pi)^{n/2} |\Sigma|^{1/2} } \exp \left( -\frac{1}{2} (x-\mu)^T \Sigma^{-1} (x-\mu) \right)p(x;μ,Σ)=(2π)n/2∣Σ∣1/21​exp(−21​(x−μ)TΣ−1(x−μ))

No need to memorize this formula.

Parameters

1. Mean Vector (μ)(\mu)(μ)

Represents the center of the distribution.

Example:

μ=[00]\mu = \begin{bmatrix} 0 \\ 0 \end{bmatrix}μ=[00​]

means:

  • centered at origin

2. Covariance Matrix (Σ)(\Sigma)(Σ)

An (n×n)(n \times n)(n×n) matrix describing:

  • variances
  • correlations

Covariance Matrix Structure

For 2 features:

Σ=[σ12correlationcorrelationσ22]\Sigma = \begin{bmatrix} \sigma_1^2 & correlation \\ correlation & \sigma_2^2 \end{bmatrix}Σ=[σ12​correlation​correlationσ22​​]

1. Diagonal Entries

Represent variance of each feature.

Example:

Σ=[1001]\Sigma = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}Σ=[10​01​]

means:

  • both features have equal variance
  • no correlation

2. Off-Diagonal Entries

Represent correlation between features.

Positive Correlation

Example:

Σ=[10.80.81]\Sigma = \begin{bmatrix} 1 & 0.8 \\ 0.8 & 1 \end{bmatrix}Σ=[10.8​0.81​]

Meaning:

When x1 increases
x2 also tends to increase

Probability mass lies near:

x1≈x2x_1 \approx x_2x1​≈x2​

Visual Shape

Positive Correlesion

Negative Correlation

Example:

Σ=[1−0.8−0.81]\Sigma = \begin{bmatrix} 1 & -0.8 \\ -0.8 & 1 \end{bmatrix}Σ=[1−0.8​−0.81​]

Meaning:

When x1 increases
x2 decreases

Probability mass lies near:

x1≈−x2x_1 \approx -x_2x1​≈−x2​

Visual Shape

Negative Correlesion

3. Effect of Variance

Small Variance

Σ=[0.5000.5]\Sigma = \begin{bmatrix} 0.5 & 0 \\ 0 & 0.5 \end{bmatrix}Σ=[0.50​00.5​]

Produces:

  • narrow distribution
  • tightly clustered data

Small Variance

Large Variance

Σ=[2002]\Sigma = \begin{bmatrix} 2 & 0 \\ 0 & 2 \end{bmatrix}Σ=[20​02​]

Produces:

  • wider distribution
  • more spread-out data

Large Variance

Geometric Interpretation

Multivariate Gaussian creates:

Elliptical Probability Regions

High Probability
    ↓
   ****
 ********
**********
 ********
   ****

Points near center:

  • high probability

Points far away:

  • low probability

Anomaly Detection

Compute:

p(x)p(x)p(x)

If:

p(x)<ϵp(x) < \epsilonp(x)<ϵ

then:

x is an anomaly

Why Multivariate Gaussian Is Better

Basic Gaussian:

  • assumes features independent

Multivariate Gaussian:

  • models correlations

This helps detect anomalies like:

Low CPU + High Memory

even if each feature individually looks normal.

Comparison

Method Assumes Independence? Captures Correlation?
Basic Gaussian Yes No
Multivariate Gaussian No Yes

Visual Comparison

Basic Gaussian

Circular contours

Assumes equal independent spread.

Multivariate Gaussian

Tilted elliptical contours

Captures relationships between variables.

Mean Shifts Distribution

Changing:

μ\muμ

moves the center.

Example:

μ=[1.5−0.5]\mu = \begin{bmatrix} 1.5 \\ -0.5 \end{bmatrix}μ=[1.5−0.5​]

shifts peak to:

x1 = 1.5
x2 = -0.5

Complete Anomaly Detection Pipeline

flowchart TD
    A[Collect Normal Data]
    --> B[Estimate μ and Σ]
    --> C[Compute p(x)]
    --> D[{p(x) < ε ?}]
    -->|Yes| E[Anomaly]
    -->|No| F[Normal Example]

Advantages

###aptures Correlations

Very important for real-world systems.

More Expressive

Can model:

  • tilted distributions
  • elongated regions
  • correlated variables

Disadvantages

Needs more data.

Why?

Because covariance matrix has many parameters:

n×nn \times nn×n

For large (n):

  • expensive
  • harder to estimate reliably

Practical Rule

Use Basic Gaussian When

  • features mostly independent
  • small dataset
  • high dimensionality

Use Multivariate Gaussian When

  • features strongly correlated
  • enough training data available

AI-Machine-Learning/6-3-Multivariate-Gaussian-Distribution
Let's work together
+49 176-2019-2523
hiteshkrsahu@gmail.com
WhatsApp
Skype
Munich 🥨, Germany 🇩🇪, EU
Playstore
Hitesh Sahu's apps on Google Play Store
Need Help?
Let's Connect
Navigation
  Home/About
  Skills
  Work/Projects
  Lab/Experiments
  Contribution
  Awards
  Art/Sketches
  Thoughts
  Contact
Links
  Sitemap
  Legal Notice
  Privacy Policy

Made with

NextJS logo

NextJS by

hitesh Sahu

| © 2026 All rights reserved.