Hitesh Sahu
Hitesh SahuHitesh Sahu
  1. Home
  2. ›
  3. posts
  4. ›
  5. …

  6. ›
  7. 2 Testing Hypothesis

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🦥 Sloths can hold their breath longer than dolphins 🐬.

🍪 This website uses cookies

No personal data is stored on our servers however third party tools Google Analytics cookies to measure traffic and improve your website experience. Learn more

Cover Image for Evaluating a Hypothesis in Neural Networks

Evaluating a Hypothesis in Neural Networks

Learn how neural networks evaluate a hypothesis using forward propagation. Understand how inputs pass through layers, weights, and activation functions to produce predictions in machine learning models.

Hitesh Sahu
Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

MapReduce for Large-Scale Machine Learning: Distributed Training at Scale

Next →

Normal Equation in Linear Regression: Formula, Intuition, and Comparison with Gradient Descent

📋 Evaluating a Hypothesis

A model that fits the training data very well is not necessarily a good hypothesis.

A model can have low training error but still perform poorly on new data due to overfitting:

  • Low training error
  • High error on unseen data

Choosing Between Multiple Models

Suppose we are trying polynomial regression with different degrees:

d=1,2,3,…d = 1, 2, 3, \dotsd=1,2,3,…

Each degree defines a different hypothesis class.

We need a principled way to choose the best ddd without biasing our evaluation.

A good model has:

  • Low training error
  • Also, Low test error

If training error is low but test error is high, the model is overfitting.

To properly select a model:

  1. Train parameters on the training set
  2. Choose model complexity using the cross-validation set
  3. Report final performance using the test set

Splitting the Dataset

To properly evaluate performance, we split the dataset into:

A common split is:

  • Training set: 60%
  • Cross-validation set: 20%
  • Test set: 20%

1. 📚 Training set Jtrain(Θ)J_{\text{train}}(\Theta)Jtrain​(Θ)

Typically, 60-70% of training data

  • Training error tells us how well the model fits known data.

Used to learn parameters Θ\ThetaΘ by minimizing the training error:

Jtrain(Θ)J_{\text{train}}(\Theta)Jtrain​(Θ)
  • using only the training set.

2. 📘 Cross Validation Set Jcv(Θ)J_{\text{cv}}(\Theta)Jcv​(Θ)

Used for Model Selection (Validation)

For each trained model Θ(d)\Theta^{(d)}Θ(d), compute:

Jcv(Θ(d))J_{\text{cv}}\big(\Theta^{(d)}\big)Jcv​(Θ(d))

using the cross-validation set.

Choose the polynomial degree:

d∗=arg⁡min⁡dJcv(Θ(d))d^* = \arg\min_d J_{\text{cv}}\big(\Theta^{(d)}\big)d∗=argdmin​Jcv​(Θ(d))

This selects the model that generalizes best among the candidates.

3. 📗 Test set Jtest(Θ)J_{\text{test}}(\Theta)Jtest​(Θ)

Remaining 20-30% of training data

  • Test error tells us how well the model generalizes.

After choosing d∗d^*d∗, estimate generalization error using:

Jtest(Θ(d∗))J_{\text{test}}\big(\Theta^{(d^*)}\big)Jtest​(Θ(d∗))

The test set is used only once, at the very end.

  • The test set must remain untouched until the very end.

Test Set Error Examples

1. Linear Regression

For linear regression, the test error is:

Jtest(Θ)=12mtest∑i=1mtest(hΘ(xtest(i))−ytest(i))2J_{\text{test}}(\Theta) = \frac{1}{2 m_{\text{test}}} \sum_{i=1}^{m_{\text{test}}} \left( h_\Theta\big(x_{\text{test}}^{(i)}\big)- y_{\text{test}}^{(i)} \right)^2Jtest​(Θ)=2mtest​1​i=1∑mtest​​(hΘ​(xtest(i)​)−ytest(i)​)2

where:

  • mtestm_{\text{test}}mtest​ is the number of test examples
  • hΘ(x)h_\Theta(x)hΘ​(x) is the hypothesis function

This measures the average squared error on unseen data.

2. Classification Logistic Regression

Given a training set, learn the parameter vector Θ\ThetaΘ by minimizing the logistic regression cost function:

Jtrain(Θ)=−1mtrain∑i=1mtrain[y(i)log⁡hΘ(x(i))+(1−y(i))log⁡(1−hΘ(x(i)))]J_{\text{train}}(\Theta) = -\frac{1}{m_{\text{train}}} \sum_{i=1}^{m_{\text{train}}} \left[ y^{(i)} \log h_\Theta(x^{(i)})+ (1 - y^{(i)}) \log \big(1 - h_\Theta(x^{(i)})\big) \right]Jtrain​(Θ)=−mtrain​1​i=1∑mtrain​​[y(i)loghΘ​(x(i))+(1−y(i))log(1−hΘ​(x(i)))]

where

hΘ(x)=σ(ΘTx)=11+e−ΘTxh_\Theta(x) = \sigma(\Theta^T x) = \frac{1}{1 + e^{-\Theta^T x}}hΘ​(x)=σ(ΘTx)=1+e−ΘTx1​

After learning Θ\ThetaΘ using the training set, evaluate performance on the test set.

The test set cost is:

Jtest(Θ)=−1mtest∑i=1mtest[ytest(i)log⁡hΘ(xtest(i))+(1−ytest(i))log⁡(1−hΘ(xtest(i)))]J_{\text{test}}(\Theta) = -\frac{1}{m_{\text{test}}} \sum_{i=1}^{m_{\text{test}}} \left[ y_{\text{test}}^{(i)} \log h_\Theta(x_{\text{test}}^{(i)})+ (1 - y_{\text{test}}^{(i)}) \log \big(1 - h_\Theta(x_{\text{test}}^{(i)})\big) \right]Jtest​(Θ)=−mtest​1​i=1∑mtest​​[ytest(i)​loghΘ​(xtest(i)​)+(1−ytest(i)​)log(1−hΘ​(xtest(i)​))]

Important:

  • Θ\ThetaΘ is not retrained on the test set.
  • We simply plug the learned Θ\ThetaΘ into the test cost formula.

Misclassification error

For classification, we often use misclassification error (also called 0/1 error).

Define:

err(hΘ(x),y)={1if hΘ(x)≥0.5 and y=01if hΘ(x)<0.5 and y=10otherwise\text{err}(h_\Theta(x), y) = \begin{cases} 1 & \text{if } h_\Theta(x) \ge 0.5 \text{ and } y = 0 \\ 1 & \text{if } h_\Theta(x) < 0.5 \text{ and } y = 1 \\ 0 & \text{otherwise} \end{cases}err(hΘ​(x),y)=⎩⎨⎧​110​if hΘ​(x)≥0.5 and y=0if hΘ​(x)<0.5 and y=1otherwise​

This gives:

  • 1 for an incorrect prediction
  • 0 for a correct prediction

Classification Average Test Error

The overall test error is:

Test Error=1mtest∑i=1mtesterr(hΘ(xtest(i)),ytest(i))\text{Test Error} = \frac{1}{m_{\text{test}}} \sum_{i=1}^{m_{\text{test}}} \text{err}\big( h_\Theta(x_{\text{test}}^{(i)}), y_{\text{test}}^{(i)} \big)Test Error=mtest​1​i=1∑mtest​​err(hΘ​(xtest(i)​),ytest(i)​)

This gives the proportion of test examples that were misclassified.


Error Analysis

A practical and effective approach to solving machine learning problems is:

  1. Start with a simple algorithm
  2. Implement it quickly
  3. Evaluate it early using cross-validation data

Avoid over-engineering before you understand where the model is failing.

Step 1 — Plot Learning Curves

Learning curves help answer questions like:

  • Would more training data help?
  • Is the model suffering from high bias?
  • Is it suffering from high variance?
  • Would more features improve performance?

They give direction before investing more time.

Step 2 — Manually Inspect Errors

After evaluating on the cross-validation set:

  • Look at misclassified examples
  • Try to identify patterns in the errors

Example

Suppose:

  • 500 total emails
  • 100 misclassified

Instead of guessing improvements, manually inspect those 100 emails.

You might categorize them:

  • Phishing emails
  • Promotional emails
  • Personal emails
  • Password theft attempts

If most errors are password-theft emails, that suggests the model is missing features specific to that category.

You could then:

  • Add features related to suspicious links
  • Add features related to urgent security language
  • Detect specific keyword patterns

Step 3 — Try Improvements Systematically

Every time you introduce a change:

  • Add a feature
  • Apply stemming
  • Modify preprocessing
  • Adjust regularization

You must measure the impact using a single numerical metric.

Without a numerical value, you cannot objectively compare changes.

Example: Stemming

Stemming treats variations of a word as the same root: [ fail , failing, failed]

If error rate drops from:

5%→3%5\% \rightarrow 3\%5%→3%

That is a strong improvement. Keep it.

Example: Case Sensitivity

Suppose distinguishing between uppercase and lowercase changes error from:

3%→3.2%3\% \rightarrow 3.2\%3%→3.2%

That is worse. Do not keep the feature.

Core Principle

Always:

  1. Make one change at a time
  2. Measure cross-validation error
  3. Keep only changes that reduce error

Avoid guessing. Let the data guide decisions.

Troubleshooting prediction errors by:

  • Getting more training examples
  • Trying smaller sets of features
  • Adding new features
  • Trying polynomial features
  • Increasing or decreasing λ\lambdaλ

we need a reliable way to evaluate the new hypothesis.


Key Insight

Error analysis turns machine learning from random tweaking into a systematic engineering process.

Instead of asking:

"What should I try next?"

You ask:

"Where is the model failing, and why?"

Then improve it in a targeted way.

AI-Machine-Learning/2-Testing-Hypothesis
Let's work together
+49 176-2019-2523
hiteshkrsahu@gmail.com
WhatsApp
Skype
Munich 🥨, Germany 🇩🇪, EU
Playstore
Hitesh Sahu's apps on Google Play Store
Need Help?
Let's Connect
Navigation
  Home/About
  Skills
  Work/Projects
  Lab/Experiments
  Contribution
  Awards
  Art/Sketches
  Thoughts
  Contact
Links
  Sitemap
  Legal Notice
  Privacy Policy

Made with

NextJS logo

NextJS by

hitesh Sahu

| © 2026 All rights reserved.