Evaluating a Hypothesis in Neural Networks
Learn how neural networks evaluate a hypothesis using forward propagation. Understand how inputs pass through layers, weights, and activation functions to produce predictions in machine learning models.
📋 Evaluating a Hypothesis
A model that fits the training data very well is not necessarily a good hypothesis.
A model can have low training error but still perform poorly on new data due to overfitting:
- Low training error
- High error on unseen data
Choosing Between Multiple Models
Suppose we are trying polynomial regression with different degrees:
Each degree defines a different hypothesis class.
We need a principled way to choose the best without biasing our evaluation.
A good model has:
- Low training error
- Also, Low test error
If training error is low but test error is high, the model is overfitting.
To properly select a model:
- Train parameters on the training set
- Choose model complexity using the cross-validation set
- Report final performance using the test set
Splitting the Dataset
To properly evaluate performance, we split the dataset into:
A common split is:
- Training set: 60%
- Cross-validation set: 20%
- Test set: 20%
Train and Evaluate the Model
- Split data into training and test sets.
- Train the model by minimizing the regularized cost .
- Obtain the optimal parameter vector .
- Compute the test error on unseen data.
- Compare training and test errors to diagnose:
- High Bias (Underfitting)
- High Variance (Overfitting)
Data Sets
1. 📚 Training set
Typically, 60-70% of training data
- Training error tells us how well the model fits known data.
Using only the training set to learn parameters by minimizing Minimize training error cost:
where:
- = number of training examples
- = number of features
- = regularization parameter
- = model prediction
2. 📘 Cross Validation Set
Used for Model Selection (Validation)
For each trained model , compute:
using the cross-validation set.
Choose the polynomial degree:
This selects the model that generalizes best among the candidates.
Important
The test error does not include the regularization term.
Regularization is used only during training to learn the parameters.
3. 📗 Test set
Remaining 20-30% of training data
- Test error tells us how well the model generalizes.
After choosing , estimate generalization error using:
The test set is used only once, at the very end.
- The test set must remain untouched until the very end.
Test Set Error Examples
1. Linear Regression
For linear regression, the test error is:
where:
- is the number of test examples
- is the hypothesis function
This measures the average squared error on unseen data.
2. Classification Logistic Regression
Given a training set, learn the parameter vector by minimizing the logistic regression cost function:
where
After learning using the training set, evaluate performance on the test set.
The test set cost is:
Important:
- is not retrained on the test set.
- We simply plug the learned into the test cost formula.
Misclassification error
For classification, we often use misclassification error (also called 0/1 error).
Define:
This gives:
- 1 for an incorrect prediction
- 0 for a correct prediction
Classification Average Test Error
The overall test error is:
This gives the proportion of test examples that were misclassified.
Error Analysis
A practical and effective approach to solving machine learning problems is:
- Start with a simple algorithm
- Implement it quickly
- Evaluate it early using cross-validation data
Avoid over-engineering before you understand where the model is failing.
Step 1 — Plot Learning Curves
Learning curves help answer questions like:
- Would more training data help?
- Is the model suffering from high bias?
- Is it suffering from high variance?
- Would more features improve performance?
They give direction before investing more time.
Step 2 — Manually Inspect Errors
After evaluating on the cross-validation set:
- Look at misclassified examples
- Try to identify patterns in the errors
Example
Suppose:
- 500 total emails
- 100 misclassified
Instead of guessing improvements, manually inspect those 100 emails.
You might categorize them:
- Phishing emails
- Promotional emails
- Personal emails
- Password theft attempts
If most errors are password-theft emails, that suggests the model is missing features specific to that category.
You could then:
- Add features related to suspicious links
- Add features related to urgent security language
- Detect specific keyword patterns
Step 3 — Try Improvements Systematically
Every time you introduce a change:
- Add a feature
- Apply stemming
- Modify preprocessing
- Adjust regularization
You must measure the impact using a single numerical metric.
Without a numerical value, you cannot objectively compare changes.
Example: Stemming
Stemming treats variations of a word as the same root: [ fail , failing, failed]
If error rate drops from:
That is a strong improvement. Keep it.
Example: Case Sensitivity
Suppose distinguishing between uppercase and lowercase changes error from:
That is worse. Do not keep the feature.
Core Principle
Always:
- Make one change at a time
- Measure cross-validation error
- Keep only changes that reduce error
Avoid guessing. Let the data guide decisions.
Troubleshooting prediction errors by:
- Getting more training examples
- Trying smaller sets of features
- Adding new features
- Trying polynomial features
- Increasing or decreasing
we need a reliable way to evaluate the new hypothesis.
Key Insight
Error analysis turns machine learning from random tweaking into a systematic engineering process.
Instead of asking:
"What should I try next?"
You ask:
"Where is the model failing, and why?"
Then improve it in a targeted way.
