Evaluating a Hypothesis in Neural Networks

Learn how neural networks evaluate a hypothesis using forward propagation. Understand how inputs pass through layers, weights, and activation functions to produce predictions in machine learning models.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

MapReduce for Large-Scale Machine Learning: Distributed Training at Scale

Normal Equation in Linear Regression: Formula, Intuition, and Comparison with Gradient Descent

📋 Evaluating a Hypothesis

A model that fits the training data very well is not necessarily a good hypothesis.

A model can have low training error but still perform poorly on new data due to overfitting:

Low training error
High error on unseen data

Choosing Between Multiple Models

Suppose we are trying polynomial regression with different degrees:

d = 1, 2, 3, \dots

Each degree defines a different hypothesis class.

We need a principled way to choose the best $d$ without biasing our evaluation.

A good model has:

Low training error
Also, Low test error

If training error is low but test error is high, the model is overfitting.

To properly select a model:

Train parameters on the training set
Choose model complexity using the cross-validation set
Report final performance using the test set

Splitting the Dataset

To properly evaluate performance, we split the dataset into:

A common split is:

Training set: 60%
Cross-validation set: 20%
Test set: 20%

1. 📚 Training set $J_{\text{train}}(\Theta)$

Typically, 60-70% of training data

Training error tells us how well the model fits known data.

Used to learn parameters $\Theta$ by minimizing the training error:

J_{\text{train}}(\Theta)

using only the training set.

2. 📘 Cross Validation Set $J_{\text{cv}}(\Theta)$

Used for Model Selection (Validation)

For each trained model $\Theta^{(d)}$ , compute:

J_{\text{cv}}\big(\Theta^{(d)}\big)

using the cross-validation set.

Choose the polynomial degree:

d^* = \arg\min_d J_{\text{cv}}\big(\Theta^{(d)}\big)

This selects the model that generalizes best among the candidates.

3. 📗 Test set $J_{\text{test}}(\Theta)$

Remaining 20-30% of training data

Test error tells us how well the model generalizes.

After choosing $d^*$ , estimate generalization error using:

J_{\text{test}}\big(\Theta^{(d^*)}\big)

The test set is used only once, at the very end.

The test set must remain untouched until the very end.

Test Set Error Examples

1. Linear Regression

For linear regression, the test error is:

J_{\text{test}}(\Theta) = \frac{1}{2 m_{\text{test}}} \sum_{i=1}^{m_{\text{test}}} \left( h_\Theta\big(x_{\text{test}}^{(i)}\big)- y_{\text{test}}^{(i)} \right)^2

where:

$m_{\text{test}}$ is the number of test examples
$h_\Theta(x)$ is the hypothesis function

This measures the average squared error on unseen data.

2. Classification Logistic Regression

Given a training set, learn the parameter vector $\Theta$ by minimizing the logistic regression cost function:

J_{\text{train}}(\Theta) = -\frac{1}{m_{\text{train}}} \sum_{i=1}^{m_{\text{train}}} \left[ y^{(i)} \log h_\Theta(x^{(i)})+ (1 - y^{(i)}) \log \big(1 - h_\Theta(x^{(i)})\big) \right]

where

h_\Theta(x) = \sigma(\Theta^T x) = \frac{1}{1 + e^{-\Theta^T x}}

After learning $\Theta$ using the training set, evaluate performance on the test set.

The test set cost is:

J_{\text{test}}(\Theta) = -\frac{1}{m_{\text{test}}} \sum_{i=1}^{m_{\text{test}}} \left[ y_{\text{test}}^{(i)} \log h_\Theta(x_{\text{test}}^{(i)})+ (1 - y_{\text{test}}^{(i)}) \log \big(1 - h_\Theta(x_{\text{test}}^{(i)})\big) \right]

Important:

$\Theta$ is not retrained on the test set.
We simply plug the learned $\Theta$ into the test cost formula.

Misclassification error

For classification, we often use misclassification error (also called 0/1 error).

Define:

\text{err}(h_\Theta(x), y) = \begin{cases} 1 & \text{if } h_\Theta(x) \ge 0.5 \text{ and } y = 0 \\ 1 & \text{if } h_\Theta(x) < 0.5 \text{ and } y = 1 \\ 0 & \text{otherwise} \end{cases}

This gives:

1 for an incorrect prediction
0 for a correct prediction

Classification Average Test Error

The overall test error is:

\text{Test Error} = \frac{1}{m_{\text{test}}} \sum_{i=1}^{m_{\text{test}}} \text{err}\big( h_\Theta(x_{\text{test}}^{(i)}), y_{\text{test}}^{(i)} \big)

This gives the proportion of test examples that were misclassified.

Error Analysis

A practical and effective approach to solving machine learning problems is:

Start with a simple algorithm
Implement it quickly
Evaluate it early using cross-validation data

Avoid over-engineering before you understand where the model is failing.

Step 1 — Plot Learning Curves

Learning curves help answer questions like:

Would more training data help?
Is the model suffering from high bias?
Is it suffering from high variance?
Would more features improve performance?

They give direction before investing more time.

Step 2 — Manually Inspect Errors

After evaluating on the cross-validation set:

Look at misclassified examples
Try to identify patterns in the errors

Example

Suppose:

500 total emails
100 misclassified

Instead of guessing improvements, manually inspect those 100 emails.

You might categorize them:

Phishing emails
Promotional emails
Personal emails
Password theft attempts

If most errors are password-theft emails, that suggests the model is missing features specific to that category.

You could then:

Add features related to suspicious links
Add features related to urgent security language
Detect specific keyword patterns

Step 3 — Try Improvements Systematically

Every time you introduce a change:

Add a feature
Apply stemming
Modify preprocessing
Adjust regularization

You must measure the impact using a single numerical metric.

Without a numerical value, you cannot objectively compare changes.

Example: Stemming

Stemming treats variations of a word as the same root: [ fail , failing, failed]

If error rate drops from:

5\% \rightarrow 3\%

That is a strong improvement. Keep it.

Example: Case Sensitivity

Suppose distinguishing between uppercase and lowercase changes error from:

3\% \rightarrow 3.2\%

That is worse. Do not keep the feature.

Core Principle

Always:

Make one change at a time
Measure cross-validation error
Keep only changes that reduce error

Avoid guessing. Let the data guide decisions.

Troubleshooting prediction errors by:

Getting more training examples
Trying smaller sets of features
Adding new features
Trying polynomial features
Increasing or decreasing $\lambda$

we need a reliable way to evaluate the new hypothesis.

Key Insight

Error analysis turns machine learning from random tweaking into a systematic engineering process.

Instead of asking:

"What should I try next?"

You ask:

"Where is the model failing, and why?"

Then improve it in a targeted way.

Evaluating a Hypothesis in Neural Networks

Learn how neural networks evaluate a hypothesis using forward propagation. Understand how inputs pass through layers, weights, and activation functions to produce predictions in machine learning models.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

MapReduce for Large-Scale Machine Learning: Distributed Training at Scale

Normal Equation in Linear Regression: Formula, Intuition, and Comparison with Gradient Descent

📋 Evaluating a Hypothesis

A model that fits the training data very well is not necessarily a good hypothesis.

A model can have low training error but still perform poorly on new data due to overfitting:

Low training error
High error on unseen data

Choosing Between Multiple Models

Suppose we are trying polynomial regression with different degrees:

d = 1, 2, 3, \dots

Each degree defines a different hypothesis class.

We need a principled way to choose the best $d$ without biasing our evaluation.

A good model has:

Low training error
Also, Low test error

If training error is low but test error is high, the model is overfitting.

To properly select a model:

Train parameters on the training set
Choose model complexity using the cross-validation set
Report final performance using the test set

Splitting the Dataset

To properly evaluate performance, we split the dataset into:

A common split is:

Training set: 60%
Cross-validation set: 20%
Test set: 20%

1. 📚 Training set $J_{\text{train}}(\Theta)$

Typically, 60-70% of training data

Training error tells us how well the model fits known data.

Used to learn parameters $\Theta$ by minimizing the training error:

J_{\text{train}}(\Theta)

using only the training set.

2. 📘 Cross Validation Set $J_{\text{cv}}(\Theta)$

Used for Model Selection (Validation)

For each trained model $\Theta^{(d)}$ , compute:

J_{\text{cv}}\big(\Theta^{(d)}\big)

using the cross-validation set.

Choose the polynomial degree:

d^* = \arg\min_d J_{\text{cv}}\big(\Theta^{(d)}\big)

This selects the model that generalizes best among the candidates.

3. 📗 Test set $J_{\text{test}}(\Theta)$

Remaining 20-30% of training data

Test error tells us how well the model generalizes.

After choosing $d^*$ , estimate generalization error using:

J_{\text{test}}\big(\Theta^{(d^*)}\big)

The test set is used only once, at the very end.

The test set must remain untouched until the very end.

Test Set Error Examples

1. Linear Regression

For linear regression, the test error is:

J_{\text{test}}(\Theta) = \frac{1}{2 m_{\text{test}}} \sum_{i=1}^{m_{\text{test}}} \left( h_\Theta\big(x_{\text{test}}^{(i)}\big)- y_{\text{test}}^{(i)} \right)^2

where:

$m_{\text{test}}$ is the number of test examples
$h_\Theta(x)$ is the hypothesis function

This measures the average squared error on unseen data.

2. Classification Logistic Regression

Given a training set, learn the parameter vector $\Theta$ by minimizing the logistic regression cost function:

J_{\text{train}}(\Theta) = -\frac{1}{m_{\text{train}}} \sum_{i=1}^{m_{\text{train}}} \left[ y^{(i)} \log h_\Theta(x^{(i)})+ (1 - y^{(i)}) \log \big(1 - h_\Theta(x^{(i)})\big) \right]

where

h_\Theta(x) = \sigma(\Theta^T x) = \frac{1}{1 + e^{-\Theta^T x}}

After learning $\Theta$ using the training set, evaluate performance on the test set.

The test set cost is:

J_{\text{test}}(\Theta) = -\frac{1}{m_{\text{test}}} \sum_{i=1}^{m_{\text{test}}} \left[ y_{\text{test}}^{(i)} \log h_\Theta(x_{\text{test}}^{(i)})+ (1 - y_{\text{test}}^{(i)}) \log \big(1 - h_\Theta(x_{\text{test}}^{(i)})\big) \right]

Important:

$\Theta$ is not retrained on the test set.
We simply plug the learned $\Theta$ into the test cost formula.

Misclassification error

For classification, we often use misclassification error (also called 0/1 error).

Define:

\text{err}(h_\Theta(x), y) = \begin{cases} 1 & \text{if } h_\Theta(x) \ge 0.5 \text{ and } y = 0 \\ 1 & \text{if } h_\Theta(x) < 0.5 \text{ and } y = 1 \\ 0 & \text{otherwise} \end{cases}

This gives:

1 for an incorrect prediction
0 for a correct prediction

Classification Average Test Error

The overall test error is:

\text{Test Error} = \frac{1}{m_{\text{test}}} \sum_{i=1}^{m_{\text{test}}} \text{err}\big( h_\Theta(x_{\text{test}}^{(i)}), y_{\text{test}}^{(i)} \big)

This gives the proportion of test examples that were misclassified.

Error Analysis

A practical and effective approach to solving machine learning problems is:

Start with a simple algorithm
Implement it quickly
Evaluate it early using cross-validation data

Avoid over-engineering before you understand where the model is failing.

Step 1 — Plot Learning Curves

Learning curves help answer questions like:

Would more training data help?
Is the model suffering from high bias?
Is it suffering from high variance?
Would more features improve performance?

They give direction before investing more time.

Step 2 — Manually Inspect Errors

After evaluating on the cross-validation set:

Look at misclassified examples
Try to identify patterns in the errors

Example

Suppose:

500 total emails
100 misclassified

Instead of guessing improvements, manually inspect those 100 emails.

You might categorize them:

Phishing emails
Promotional emails
Personal emails
Password theft attempts

If most errors are password-theft emails, that suggests the model is missing features specific to that category.

You could then:

Add features related to suspicious links
Add features related to urgent security language
Detect specific keyword patterns

Step 3 — Try Improvements Systematically

Every time you introduce a change:

Add a feature
Apply stemming
Modify preprocessing
Adjust regularization

You must measure the impact using a single numerical metric.

Without a numerical value, you cannot objectively compare changes.

Example: Stemming

Stemming treats variations of a word as the same root: [ fail , failing, failed]

If error rate drops from:

5\% \rightarrow 3\%

That is a strong improvement. Keep it.

Example: Case Sensitivity

Suppose distinguishing between uppercase and lowercase changes error from:

3\% \rightarrow 3.2\%

That is worse. Do not keep the feature.

Core Principle

Always:

Make one change at a time
Measure cross-validation error
Keep only changes that reduce error

Avoid guessing. Let the data guide decisions.

Troubleshooting prediction errors by:

Getting more training examples
Trying smaller sets of features
Adding new features
Trying polynomial features
Increasing or decreasing $\lambda$

we need a reliable way to evaluate the new hypothesis.

Key Insight

Error analysis turns machine learning from random tweaking into a systematic engineering process.

Instead of asking:

"What should I try next?"

You ask:

"Where is the model failing, and why?"

Then improve it in a targeted way.

Evaluating a Hypothesis in Neural Networks

Learn how neural networks evaluate a hypothesis using forward propagation. Understand how inputs pass through layers, weights, and activation functions to produce predictions in machine learning models.

Written by Hitesh Sahu, a passionate developer and blogger.

📋 Evaluating a Hypothesis

Choosing Between Multiple Models

A good model has:

To properly select a model:

Splitting the Dataset

1. 📚 Training set Jtrain(Θ)J_{\text{train}}(\Theta)Jtrain​(Θ)

2. 📘 Cross Validation Set Jcv(Θ)J_{\text{cv}}(\Theta)Jcv​(Θ)

3. 📗 Test set Jtest(Θ)J_{\text{test}}(\Theta)Jtest​(Θ)

Test Set Error Examples

1. Linear Regression

2. Classification Logistic Regression

Misclassification error

Classification Average Test Error

Error Analysis

Step 1 — Plot Learning Curves

Step 2 — Manually Inspect Errors

Example

Step 3 — Try Improvements Systematically

Example: Stemming

Example: Case Sensitivity

Core Principle

Troubleshooting prediction errors by:

Key Insight

Fetching content, this won’t take long…

🦥 Sloths can hold their breath longer than dolphins 🐬.

Evaluating a Hypothesis in Neural Networks

Learn how neural networks evaluate a hypothesis using forward propagation. Understand how inputs pass through layers, weights, and activation functions to produce predictions in machine learning models.

Written by Hitesh Sahu, a passionate developer and blogger.

📋 Evaluating a Hypothesis

Choosing Between Multiple Models

A good model has:

To properly select a model:

Splitting the Dataset

1. 📚 Training set Jtrain(Θ)J_{\text{train}}(\Theta)Jtrain​(Θ)

2. 📘 Cross Validation Set Jcv(Θ)J_{\text{cv}}(\Theta)Jcv​(Θ)

3. 📗 Test set Jtest(Θ)J_{\text{test}}(\Theta)Jtest​(Θ)

Test Set Error Examples

1. Linear Regression

2. Classification Logistic Regression

Misclassification error

Classification Average Test Error

Error Analysis

Step 1 — Plot Learning Curves

Step 2 — Manually Inspect Errors

Example

Step 3 — Try Improvements Systematically

Example: Stemming

Example: Case Sensitivity

Core Principle

Troubleshooting prediction errors by:

Key Insight

1. 📚 Training set $J_{\text{train}}(\Theta)$

2. 📘 Cross Validation Set $J_{\text{cv}}(\Theta)$

3. 📗 Test set $J_{\text{test}}(\Theta)$

1. 📚 Training set $J_{\text{train}}(\Theta)$

2. 📘 Cross Validation Set $J_{\text{cv}}(\Theta)$

3. 📗 Test set $J_{\text{test}}(\Theta)$