Hitesh Sahu
Hitesh SahuHitesh Sahu
  1. Home
  2. ›
  3. posts
  4. ›
  5. …

  6. ›
  7. 2 2 Bias Variance

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🤯 Your stomach gets a new lining every 3–4 days.

🍪 This website uses cookies

No personal data is stored on our servers however third party tools Google Analytics cookies to measure traffic and improve your website experience. Learn more

AI-Machine-Learning

  • AI-Machine-Learning Index

  • Machine Learning Learning Path

  • Machine Learning: Introduction and Core Algorithms

  • Linear Regression Explained: Single Variable and Multivariate Models with Gradient Descent

  • Evaluating a Hypothesis in Neural Networks

  • Bias-Variance Dilemma

  • Cost Function Regularization: Balancing Bias and Variance in Machine Learning Models

  • Polynomial Regression

  • Normal Equation in Linear Regression: Formula, Intuition, and Comparison with Gradient Descent

  • Logistic Regression for Classification: Concept, Sigmoid Function, Cost Function, and Implementation

  • Logistic Regression for Classification: Concept, Sigmoid Function, Cost Function, and Implementation

  • Support Vector Machines (SVM): Maximizing Margins for Robust Machine Learning Models

  • XGBoost (Extreme Gradient Boosting) Explained

  • Dimensionality Reduction in Machine Learning

  • Principal Component Analysis (PCA) Explained

  • t-SNE (t-distributed Stochastic Neighbor Embedding) Explained

  • K-Means Clustering

  • Anomaly Detection: Identifying Rare and Unusual Patterns in Data

  • Anomaly Detection Using Gaussian Distribution in Machine Learning

  • Anomaly Detection Using Multivariate Gaussian Distribution

  • Recommender Systems: Collaborative Filtering, Content-Based Filtering, and Hybrid Approaches

  • Collaborative Filtering: Building Recommender Systems with Feature Learning

  • Anomaly Detection: Identifying Rare and Unusual Patterns in Data

  • Large Scale Machine Learning: Training Models on Massive Datasets

  • Stochastic Gradient Descent (SGD): Efficient Optimization for Large Datasets

  • MapReduce for Large-Scale Machine Learning: Distributed Training at Scale


Cover Image for Bias-Variance Dilemma

Bias-Variance Dilemma

Understanding the bias-variance tradeoff in machine learning, including the concepts of bias and variance, underfitting and overfitting, and strategies to balance model complexity for better generalization.

Hitesh Sahu
Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Evaluating a Hypothesis in Neural Networks

Next →

Cost Function Regularization: Balancing Bias and Variance in Machine Learning Models

Bias-Variance Dilemma

Ideally, one wants to choose a model that both accurately captures the regularities in its training data, but also generalizes well to unseen data.

When a model performs poorly, the key question is:

Is the problem bias or variance?

  • High bias → underfitting
  • High variance → overfitting

Our goal is to find the balance between the two.

🔬 Diagnosing Bias vs. Variance

Effect of Polynomial Degree ddd

As degree of polynomial ddd increases:

📚 Training error : Jtrain(Θ)J_{\text{train}}(\Theta)Jtrain​(Θ)

  • Jtrain(Θ)J_{\text{train}}(\Theta)Jtrain​(Θ) steadily decreases
  • Higher-degree models fit the training data better

📘 Cross-validation error JCV(Θ)J_{\text{CV}}(\Theta)JCV​(Θ)

  • JCV(Θ)J_{\text{CV}}(\Theta)JCV​(Θ) decreases, then increases
  • Forms a convex (U-shaped) curve

Low ddd → High bias
High ddd → High variance
Middle ddd → Good balance

Bias vs Variance

This behavior helps us diagnose bias vs. variance.

Practical Diagnostic Rule

Situation JtrainJ_{\text{train}}Jtrain​ JCVJ_{\text{CV}}JCV​ Diagnosis
Both high and similar High High High bias
Large gap (train low, CV high) Low High High variance
Both low and similar Low Low Good fit

Bias vs Variance Summary

Concept Meaning Cause Effect
High Bias Model too simple Too few features Underfitting
High Variance Model too complex Too many features Overfitting

High Bias (Underfitting) 🦎

The model is too simple to capture the underlying pattern of the data

Characteristics:

  • Model is too simple
  • Fails to capture structure in the data
  • Adding more data does not help much

High Bias in Model

Costs

  • Jtrain(Θ)J_{\text{train}}(\Theta)Jtrain​(Θ) is high

  • JCV(Θ)J_{\text{CV}}(\Theta)JCV​(Θ) is also high

Problem:

  • Poor training performance

    High error as equation does not cover all dataset

Jtrain(Θ) is highJ_{\text{train}}(\Theta) \text{ is high}Jtrain​(Θ) is high
  • Poor Test performance

    Fail when new data set introduced

JCV(Θ) is highJ_{\text{CV}}(\Theta) \text{ is high}JCV​(Θ) is high
  • The model performs poorly everywhere
JCV(Θ)≈Jtrain(Θ)J_{\text{CV}}(\Theta) \approx J_{\text{train}}(\Theta)JCV​(Θ)≈Jtrain​(Θ)

Interpretation:

  • Adding more data usually does not help much
  • Increasing model complexity(ddd) may help

🪱 High Variance (Overfitting)

model is too complex and starts fitting the training data perfectly

  • Model can bend heavily to pass through every training point.

Characteristics:

  • Model is too complex
  • Fits noise in the training data
  • Adding more data can help reduce variance
  • Jtrain(Θ)J_{\text{train}}(\Theta)Jtrain​(Θ) is low, but JCV(Θ)J_{\text{CV}}(\Theta)JCV​(Θ) is high

Problem:

Low training error ie. good training performance

Jtrain(Θ) is lowJ_{\text{train}}(\Theta) \text{ is low}Jtrain​(Θ) is low

Poor test performance lead to poor performance on unseen data.

  • Poor generalization to new data
JCV(Θ)≫Jtrain(Θ)J_{\text{CV}}(\Theta) \gg J_{\text{train}}(\Theta)JCV​(Θ)≫Jtrain​(Θ)

Interpretation:

  • Model performs very well on training data
  • Performs poorly on unseen data
  • Large gap between training and validation error

Solutions

  1. Use Regularization term to add Penalty for features
  2. Reduce model complexity:
    • Reduce Number of Features: Manually select important features
    • Remove irrelevant variables
    • Use automated model selection methods
  3. Add more training data to help the model learn the true underlying pattern and reduce overfitting.

Key Insight

  • Bias is about model simplicity.
  • Variance is about model sensitivity to data.

Good model selection is about finding the degree ddd that minimizes:

JCV(Θ)J_{\text{CV}}(\Theta)JCV​(Θ)

while avoiding both underfitting and overfitting.

← Previous

Evaluating a Hypothesis in Neural Networks

Next →

Cost Function Regularization: Balancing Bias and Variance in Machine Learning Models

AI-Machine-Learning/2-2-Bias-Variance
Let's work together
+49 176-2019-2523
hiteshkrsahu@gmail.com
WhatsApp
Skype
Munich 🥨, Germany 🇩🇪, EU
Playstore
Hitesh Sahu's apps on Google Play Store
Need Help?
Let's Connect
Navigation
  Home/About
  Skills
  Work/Projects
  Lab/Experiments
  Contribution
  Awards
  Art/Sketches
  Thoughts
  Contact
Links
  Sitemap
  Legal Notice
  Privacy Policy

Made with

NextJS logo

NextJS by

hitesh Sahu

| © 2026 All rights reserved.