Evaluating a Hypothesis in Neural Networks
Learn how neural networks evaluate a hypothesis using forward propagation. Understand how inputs pass through layers, weights, and activation functions to produce predictions in machine learning models.
MapReduce for Large-Scale Machine Learning: Distributed Training at Scale
Normal Equation in Linear Regression: Formula, Intuition, and Comparison with Gradient Descent
📋 Evaluating a Hypothesis
A model that fits the training data very well is not necessarily a good hypothesis.
A model can have low training error but still perform poorly on new data due to overfitting:
- Low training error
- High error on unseen data
Choosing Between Multiple Models
Suppose we are trying polynomial regression with different degrees:
Each degree defines a different hypothesis class.
We need a principled way to choose the best without biasing our evaluation.
A good model has:
- Low training error
- Also, Low test error
If training error is low but test error is high, the model is overfitting.
To properly select a model:
- Train parameters on the training set
- Choose model complexity using the cross-validation set
- Report final performance using the test set
Splitting the Dataset
To properly evaluate performance, we split the dataset into:
A common split is:
- Training set: 60%
- Cross-validation set: 20%
- Test set: 20%
1. 📚 Training set
Typically, 60-70% of training data
- Training error tells us how well the model fits known data.
Used to learn parameters by minimizing the training error:
- using only the training set.
2. 📘 Cross Validation Set
Used for Model Selection (Validation)
For each trained model , compute:
using the cross-validation set.
Choose the polynomial degree:
This selects the model that generalizes best among the candidates.
3. 📗 Test set
Remaining 20-30% of training data
- Test error tells us how well the model generalizes.
After choosing , estimate generalization error using:
The test set is used only once, at the very end.
- The test set must remain untouched until the very end.
Test Set Error Examples
1. Linear Regression
For linear regression, the test error is:
where:
- is the number of test examples
- is the hypothesis function
This measures the average squared error on unseen data.
2. Classification Logistic Regression
Given a training set, learn the parameter vector by minimizing the logistic regression cost function:
where
After learning using the training set, evaluate performance on the test set.
The test set cost is:
Important:
- is not retrained on the test set.
- We simply plug the learned into the test cost formula.
Misclassification error
For classification, we often use misclassification error (also called 0/1 error).
Define:
This gives:
- 1 for an incorrect prediction
- 0 for a correct prediction
Classification Average Test Error
The overall test error is:
This gives the proportion of test examples that were misclassified.
Error Analysis
A practical and effective approach to solving machine learning problems is:
- Start with a simple algorithm
- Implement it quickly
- Evaluate it early using cross-validation data
Avoid over-engineering before you understand where the model is failing.
Step 1 — Plot Learning Curves
Learning curves help answer questions like:
- Would more training data help?
- Is the model suffering from high bias?
- Is it suffering from high variance?
- Would more features improve performance?
They give direction before investing more time.
Step 2 — Manually Inspect Errors
After evaluating on the cross-validation set:
- Look at misclassified examples
- Try to identify patterns in the errors
Example
Suppose:
- 500 total emails
- 100 misclassified
Instead of guessing improvements, manually inspect those 100 emails.
You might categorize them:
- Phishing emails
- Promotional emails
- Personal emails
- Password theft attempts
If most errors are password-theft emails, that suggests the model is missing features specific to that category.
You could then:
- Add features related to suspicious links
- Add features related to urgent security language
- Detect specific keyword patterns
Step 3 — Try Improvements Systematically
Every time you introduce a change:
- Add a feature
- Apply stemming
- Modify preprocessing
- Adjust regularization
You must measure the impact using a single numerical metric.
Without a numerical value, you cannot objectively compare changes.
Example: Stemming
Stemming treats variations of a word as the same root: [ fail , failing, failed]
If error rate drops from:
That is a strong improvement. Keep it.
Example: Case Sensitivity
Suppose distinguishing between uppercase and lowercase changes error from:
That is worse. Do not keep the feature.
Core Principle
Always:
- Make one change at a time
- Measure cross-validation error
- Keep only changes that reduce error
Avoid guessing. Let the data guide decisions.
Troubleshooting prediction errors by:
- Getting more training examples
- Trying smaller sets of features
- Adding new features
- Trying polynomial features
- Increasing or decreasing
we need a reliable way to evaluate the new hypothesis.
Key Insight
Error analysis turns machine learning from random tweaking into a systematic engineering process.
Instead of asking:
"What should I try next?"
You ask:
"Where is the model failing, and why?"
Then improve it in a targeted way.
