Cost Function Regularization: Balancing Bias and Variance in Machine Learning Models

Learn how cost function regularization helps prevent overfitting in machine learning models by adding a penalty term to the cost function, controlling model complexity, and improving generalization performance.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Support Vector Machines (SVM): Maximizing Margins for Robust Machine Learning Models

Regularized Linear Regression

⚖️ Cost Function Regularization

If a model is overfitting, we can reduce the influence of certain terms by increasing their cost. This discourages large weights.

Regularization balances:

Bias
Variance

General Regularized Cost Function

We can regularize all parameters using a single summation:

\min_\theta \; \frac{1}{2m} \sum_{i=1}^{m} \left( h_\theta(x^{(i)}) - y^{(i)} \right)^2+ \lambda \sum_{j=1}^{n} \theta_j^2

Where the regularization term is:

\lambda \sum_{j=1}^{n} \theta_j^2

$\lambda$ is the regularization parameter that controls the strength of regularization.
The summation is over $j=1$ to $n$ , excluding $\theta_0$ .
This term penalizes large values of $\theta_j$ , encouraging smaller weights and thus simpler models.

Regularization Parameter $\lambda$

Regularization shrinks parameters. The more shrinkage you see, the larger the $\lambda$

Choosing $\lambda$ correctly is essential for good generalization.

Lambda controls the curve of the decision boundary.

Larger $\lambda$ → stronger regularization

$\lambda \to \infty$ → all parameters shrink to zero → model becomes too simple → underfitting

Parameter Weights $\theta_j$ shrink toward zero
Reduces model complexity and make it rigid/linear
Underfitting may occur
- Bias increases
- Variance decreases

Example:

$\lambda = 1 => \theta =[ 13.01, 0.91]$

Smaller $\lambda$ (as $λ → 0$ )

$\lambda \to 0$ → no regularization → model may overfit

weaker regularization --> Less Penalty --> Large weights $\theta_j$

Parameter weights grow larger
More complex models & becomes more flexible/curvy
Risk of overfitting
- Variance increases
- Bias decreases

Small λ → Low bias, high variance (overfitting)

Example:

$\lambda = 0.01 => \theta =[ 81.01, 12.00]$

What Happens If $\lambda = 0$ ?

No regularization is applied
The model may overfit
We revert to standard least squares / logistic regression

How to Choose the Best λ

To select the optimal regularization parameter:

Choose candidate λ values
Train models for each λ
Compute cross-validation error (without regularization)
Select best λ + model
Evaluate once on test set

1. Create Candidate Values

Example:

\lambda \in \{0, 0.01, 0.02, 0.04, 0.08, 0.16, 0.32, 0.64, 1.28, 2.56, 5.12, 10.24\}

S2. Train Models

For each value of λ:

Train model parameters Θ
Possibly try different model complexities (degrees, architectures, etc.)

3. Compute Cross-Validation Error

Evaluate using:

J_{CV}(\Theta)

Important:

Compute cross-validation error without regularization
That means use λ = 0 when evaluating

This ensures fair comparison between models.

4. Select Best Combination

Choose the model and λ that produce the lowest cross-validation error.

5. Final Evaluation

Using the best:

Evaluate on the test set:

J_{test}(\Theta)

This measures generalization performance.

Example: Polynomial Hypothesis

Consider the function:

\theta_0 + \theta_1 x + \theta_2 x^2 + \theta_3 x^3+ \theta_4 x^4

If we want the model to behave more like a quadratic function, we can reduce the influence of:

\theta_3 x^3 \quad \text{and} \quad \theta_4 x^4

Instead of removing these features, we modify the cost function.

Regularized Cost Function

We minimize:

\min_\theta \; \frac{1}{2m} \sum_{i=1}^{m} \left( h_\theta(x^{(i)}) - y^{(i)} \right)^2 + 1000 \theta_3^2 + 1000 \theta_4^2

Effect of Large Penalty

Adding large penalty terms forces:

\theta_3 \approx 0 \quad \text{and} \quad \theta_4 \approx 0

This reduces the contribution of:

\theta_3 x^3 \quad \text{and} \quad \theta_4 x^4

As a result:

The hypothesis becomes smoother
Overfitting decreases
The curve behaves more like a quadratic function

Cost Function Regularization: Balancing Bias and Variance in Machine Learning Models

Learn how cost function regularization helps prevent overfitting in machine learning models by adding a penalty term to the cost function, controlling model complexity, and improving generalization performance.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Support Vector Machines (SVM): Maximizing Margins for Robust Machine Learning Models

Regularized Linear Regression

⚖️ Cost Function Regularization

If a model is overfitting, we can reduce the influence of certain terms by increasing their cost. This discourages large weights.

Regularization balances:

Bias
Variance

General Regularized Cost Function

We can regularize all parameters using a single summation:

\min_\theta \; \frac{1}{2m} \sum_{i=1}^{m} \left( h_\theta(x^{(i)}) - y^{(i)} \right)^2+ \lambda \sum_{j=1}^{n} \theta_j^2

Where the regularization term is:

\lambda \sum_{j=1}^{n} \theta_j^2

$\lambda$ is the regularization parameter that controls the strength of regularization.
The summation is over $j=1$ to $n$ , excluding $\theta_0$ .
This term penalizes large values of $\theta_j$ , encouraging smaller weights and thus simpler models.

Regularization Parameter $\lambda$

Regularization shrinks parameters. The more shrinkage you see, the larger the $\lambda$

Choosing $\lambda$ correctly is essential for good generalization.

Lambda controls the curve of the decision boundary.

Larger $\lambda$ → stronger regularization

$\lambda \to \infty$ → all parameters shrink to zero → model becomes too simple → underfitting

Parameter Weights $\theta_j$ shrink toward zero
Reduces model complexity and make it rigid/linear
Underfitting may occur
- Bias increases
- Variance decreases

Example:

$\lambda = 1 => \theta =[ 13.01, 0.91]$

Smaller $\lambda$ (as $λ → 0$ )

$\lambda \to 0$ → no regularization → model may overfit

weaker regularization --> Less Penalty --> Large weights $\theta_j$

Parameter weights grow larger
More complex models & becomes more flexible/curvy
Risk of overfitting
- Variance increases
- Bias decreases

Small λ → Low bias, high variance (overfitting)

Example:

$\lambda = 0.01 => \theta =[ 81.01, 12.00]$

What Happens If $\lambda = 0$ ?

No regularization is applied
The model may overfit
We revert to standard least squares / logistic regression

How to Choose the Best λ

To select the optimal regularization parameter:

Choose candidate λ values
Train models for each λ
Compute cross-validation error (without regularization)
Select best λ + model
Evaluate once on test set

1. Create Candidate Values

Example:

\lambda \in \{0, 0.01, 0.02, 0.04, 0.08, 0.16, 0.32, 0.64, 1.28, 2.56, 5.12, 10.24\}

S2. Train Models

For each value of λ:

Train model parameters Θ
Possibly try different model complexities (degrees, architectures, etc.)

3. Compute Cross-Validation Error

Evaluate using:

J_{CV}(\Theta)

Important:

Compute cross-validation error without regularization
That means use λ = 0 when evaluating

This ensures fair comparison between models.

4. Select Best Combination

Choose the model and λ that produce the lowest cross-validation error.

5. Final Evaluation

Using the best:

Evaluate on the test set:

J_{test}(\Theta)

This measures generalization performance.

Example: Polynomial Hypothesis

Consider the function:

\theta_0 + \theta_1 x + \theta_2 x^2 + \theta_3 x^3+ \theta_4 x^4

If we want the model to behave more like a quadratic function, we can reduce the influence of:

\theta_3 x^3 \quad \text{and} \quad \theta_4 x^4

Instead of removing these features, we modify the cost function.

Regularized Cost Function

We minimize:

\min_\theta \; \frac{1}{2m} \sum_{i=1}^{m} \left( h_\theta(x^{(i)}) - y^{(i)} \right)^2 + 1000 \theta_3^2 + 1000 \theta_4^2

Effect of Large Penalty

Adding large penalty terms forces:

\theta_3 \approx 0 \quad \text{and} \quad \theta_4 \approx 0

This reduces the contribution of:

\theta_3 x^3 \quad \text{and} \quad \theta_4 x^4

As a result:

The hypothesis becomes smoother
Overfitting decreases
The curve behaves more like a quadratic function

Cost Function Regularization: Balancing Bias and Variance in Machine Learning Models

Learn how cost function regularization helps prevent overfitting in machine learning models by adding a penalty term to the cost function, controlling model complexity, and improving generalization performance.

Written by Hitesh Sahu, a passionate developer and blogger.

⚖️ Cost Function Regularization

General Regularized Cost Function

Regularization Parameter $\lambda$

Larger $\lambda$ → stronger regularization

Smaller $\lambda$ (as $λ → 0$ )

What Happens If $\lambda = 0$ ?

How to Choose the Best λ

1. Create Candidate Values

S2. Train Models

3. Compute Cross-Validation Error

4. Select Best Combination

5. Final Evaluation

Example: Polynomial Hypothesis

Regularized Cost Function

Effect of Large Penalty

Playstore

Fetching content, this won’t take long…

🐙 Octopuses have three hearts and blue blood.

Cost Function Regularization: Balancing Bias and Variance in Machine Learning Models

Learn how cost function regularization helps prevent overfitting in machine learning models by adding a penalty term to the cost function, controlling model complexity, and improving generalization performance.

Written by Hitesh Sahu, a passionate developer and blogger.

⚖️ Cost Function Regularization

General Regularized Cost Function

Regularization Parameter $\lambda$

Larger $\lambda$ → stronger regularization

Smaller $\lambda$ (as $λ → 0$ )

What Happens If $\lambda = 0$ ?

How to Choose the Best λ

1. Create Candidate Values

S2. Train Models

3. Compute Cross-Validation Error

4. Select Best Combination

5. Final Evaluation

Example: Polynomial Hypothesis

Regularized Cost Function

Effect of Large Penalty

Playstore

Cost Function Regularization: Balancing Bias and Variance in Machine Learning Models

Learn how cost function regularization helps prevent overfitting in machine learning models by adding a penalty term to the cost function, controlling model complexity, and improving generalization performance.

Written by Hitesh Sahu, a passionate developer and blogger.

⚖️ Cost Function Regularization

General Regularized Cost Function

Regularization Parameter λ\lambdaλ

Larger λ\lambdaλ → stronger regularization

Smaller λ\lambdaλ (as λ→0λ → 0λ→0)

What Happens If λ=0\lambda = 0λ=0?

How to Choose the Best λ

1. Create Candidate Values

S2. Train Models

3. Compute Cross-Validation Error

4. Select Best Combination

5. Final Evaluation

Example: Polynomial Hypothesis

Regularized Cost Function

Effect of Large Penalty

Fetching content, this won’t take long…

🐙 Octopuses have three hearts and blue blood.

Cost Function Regularization: Balancing Bias and Variance in Machine Learning Models

Learn how cost function regularization helps prevent overfitting in machine learning models by adding a penalty term to the cost function, controlling model complexity, and improving generalization performance.

Written by Hitesh Sahu, a passionate developer and blogger.

⚖️ Cost Function Regularization

General Regularized Cost Function

Regularization Parameter λ\lambdaλ

Larger λ\lambdaλ → stronger regularization

Smaller λ\lambdaλ (as λ→0λ → 0λ→0)

What Happens If λ=0\lambda = 0λ=0?

How to Choose the Best λ

1. Create Candidate Values

S2. Train Models

3. Compute Cross-Validation Error

4. Select Best Combination

5. Final Evaluation

Example: Polynomial Hypothesis

Regularized Cost Function

Effect of Large Penalty

Regularization Parameter $\lambda$

Larger $\lambda$ → stronger regularization

Smaller $\lambda$ (as $λ → 0$ )

What Happens If $\lambda = 0$ ?

Regularization Parameter $\lambda$

Larger $\lambda$ → stronger regularization

Smaller $\lambda$ (as $λ → 0$ )

What Happens If $\lambda = 0$ ?