Hitesh Sahu
Hitesh SahuHitesh Sahu
  1. Home
  2. ›
  3. posts
  4. ›
  5. …

  6. ›
  7. 4 Normal Equation

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🍌 Bananas are berries, but strawberries are not.

🍪 This website uses cookies

No personal data is stored on our servers however third party tools Google Analytics cookies to measure traffic and improve your website experience. Learn more

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🍯 Honey never spoils — archaeologists found 3,000-year-old jars still edible.
Cover Image for Normal Equation in Linear Regression: Formula, Intuition, and Comparison with Gradient Descent

Normal Equation in Linear Regression: Formula, Intuition, and Comparison with Gradient Descent

Understand the Normal Equation in linear regression, its closed-form solution, mathematical formula, advantages, limitations, and how it compares to gradient descent for model optimization.

Hitesh Sahu
Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

Normal Equation (Closed-Form Solution)

Instead of solving multiple iteration of gradient descent, Normal equation can get theta in one step

  • Θ can be directly calculated where cost function is minimal using calculus in one step instead of iterating iterative optimization:
θ=(XTX)−1XTy\theta = (X^T X)^{-1} X^T yθ=(XTX)−1XTy

Advantages

  • No learning rate required
  • Direct computation

Limitations

  • Computationally expensive for very large datasets
  • Matrix inversion can be costly

Steps:

  • Construct design matrix X using feature columns and add 1 in first column
  • Construct y vector using result values Y
  • calculate:

Θ = (XTX)-1 XTy

Mean Normalization

Feature scaling is not required for Normal Equation method

Normal Equation vs Gradient Descent:

Feature Gradient Descent Normal Equation
Complexity Complex need to debug alpha Convenient & Simple to implement
Choose Learning Rate(α) Required No need
Feature Scaling Required No need
Iteration Many Iteration Required Not required
Feature Set>=million Efficient if n is huge O(kn2) Slow if n is huge, cost of inverse matrix is O(n3)
Complex Learning Algo Can used for Complex learning algo Not supported

Faster single Hypothesis Prediction calculation given data set and Thetas **Much faster than nested for loops

Data Matrix * Parameter Vector = Prediction Vector

h(x) = Theta0 + Theta1x
[1 , x]*[Theta Vector] = [h(x)]

descentFormula

Usage:

  • Faster multiple Hypothesis Prediction calculation given data set and Thetas
  • Much-much faster than nested for loops

Data Matrix * Parameter Matrix = Prediction Matrix

  given h(x) = Theta0 + Theta1x
  [1 , x]*[Theta Matrix] = [h(x)]

descentFormula

AI-Machine-Learning/4-Normal-Equation
Let's work together
+49 176-2019-2523
hiteshkrsahu@gmail.com
WhatsApp
Skype
Munich 🥨, Germany 🇩🇪, EU
Playstore
Hitesh Sahu's apps on Google Play Store
Need Help?
Let's Connect
Navigation
  Home/About
  Skills
  Work/Projects
  Lab/Experiments
  Contribution
  Awards
  Art/Sketches
  Thoughts
  Contact
Links
  Sitemap
  Legal Notice
  Privacy Policy

Made with

NextJS logo

NextJS by

hitesh Sahu

| © 2026 All rights reserved.