Hitesh Sahu
Hitesh SahuHitesh Sahu
  1. Home
  2. ›
  3. posts
  4. ›
  5. …

  6. ›
  7. 7 Backward Propogation

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🐙 Octopuses have three hearts and blue blood.

🍪 This website uses cookies

No personal data is stored on our servers however third party tools Google Analytics cookies to measure traffic and improve your website experience. Learn more

Cover Image for Backpropagation Algorithm

Backpropagation Algorithm

Backpropagation is the algorithm used to minimize the neural network cost function. It computes the gradients of the cost function with respect to the parameters, allowing us to perform gradient descent and update our model.

Hitesh Sahu
Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Cost Function for Neural Networks

Next →

Backpropagation Intuition

⏪ Backpropagation Algorithm (BP)

Backpropagation is the algorithm used to minimize the neural network cost function.

Just like gradient descent in linear and logistic regression, our goal is:

min⁡ΘJ(Θ)\min_\Theta J(\Theta)Θmin​J(Θ)

That is, we want to find parameters Θ\ThetaΘ that minimize the cost function.

Where

J(Θ)=−1m∑i=1m∑k=1K[yk(i)log⁡((hΘ(x(i)))k)+(1−yk(i))log⁡(1−(hΘ(x(i)))k)]+λ2m∑l=1L−1∑i=1sl∑j=1sl+1(Θj,i(l))2J(\Theta) =- \frac{1}{m} \sum_{i=1}^{m} \sum_{k=1}^{K} \left[ y_k^{(i)} \log((h_\Theta(x^{(i)}))_k) + (1 - y_k^{(i)}) \log(1 - (h_\Theta(x^{(i)}))_k) \right] + \frac{\lambda}{2m} \sum_{l=1}^{L-1} \sum_{i=1}^{s_l} \sum_{j=1}^{s_{l+1}} (\Theta^{(l)}_{j,i})^2J(Θ)=−m1​i=1∑m​k=1∑K​[yk(i)​log((hΘ​(x(i)))k​)+(1−yk(i)​)log(1−(hΘ​(x(i)))k​)]+2mλ​l=1∑L−1​i=1∑sl​​j=1∑sl+1​​(Θj,i(l)​)2

Objective

We want to compute the partial derivatives:

∂∂Θi,j(l)J(Θ)\frac{\partial}{\partial \Theta_{i,j}^{(l)}} J(\Theta)∂Θi,j(l)​∂​J(Θ)

These derivatives are used in gradient descent to update the parameters.


❗ Loss Function (δj(l)\delta_j^{(l)}δj(l)​ )

A loss function measures how wrong your model’s prediction is compared to the actual value. It answers one simple question:

How far off was the prediction? If actual house price = €500,000

The training process tries to minimize this loss.

  • The model makes a prediction: Model predicts = €480,000
  • The loss function calculates the error : Error = €20,000
  • The optimizer adjusts the parameters to reduce that error.

Repeat this thousands of times → model improves.

Error is represented as:

δj(l)=Predicted−ActualValue\delta_j^{(l)} = Predicted - Actual Value δj(l)​=Predicted−ActualValue

δj(l)=20000\delta_j^{(l)} = 20000 δj(l)​=20000

where δj(l)\delta_j^{(l)}δj(l)​ represents the error of unit jjj in layer lll.

More formally:

δj(l)=∂∂zj(l)cost(t)\delta_j^{(l)} = \frac{\partial}{\partial z_j^{(l)}} \text{cost}(t)δj(l)​=∂zj(l)​∂​cost(t)

So:

  • δ\deltaδ is the derivative of the cost with respect to zzz
  • It measures how much that unit contributed to the error
  • Larger magnitude → steeper slope → more incorrect

Loss function converts error into a number the model can optimize.

⚖️ Loss vs Cost Function

  • Loss → error for one example
  • Cost → average loss over the dataset

Gradient Computation (Backpropagation)

  • Forward propagation → computes activations.
  • Backpropagation → computes errors (δ\deltaδ values).
  • Errors are propagated from right to left.
  • Gradients are accumulated in Δ\DeltaΔ.
  • Regularization is added for non-bias weights.
  • Finally, we divide by mmm to obtain the average gradient.

Backpropagation Algorithm

Given mmm training set:

{(x(1),y(1)),…,(x(m),y(m))}\{(x^{(1)}, y^{(1)}), \dots, (x^{(m)}, y^{(m)})\}{(x(1),y(1)),…,(x(m),y(m))}

Step 1: 🌱 Initialize Accumulators

Set:

Δi,j(l):=0\Delta_{i,j}^{(l)} := 0 Δi,j(l)​:=0

for all l,i,jl, i, jl,i,j.

This creates matrices of zeros to accumulate gradients.

Step 2: For each training example t=1t = 1t=1 to mmm

Backpropagation works per example, and gradients are summed (or averaged) over the dataset.

Example: For two training examples (x(1),y(1))(x^{(1)}, y^{(1)})(x(1),y(1)) and (x(2),y(2))(x^{(2)}, y^{(2)})(x(2),y(2))

  • compute FP for (x(1),y(1))(x^{(1)}, y^{(1)})(x(1),y(1)), Compute BP for (x(1),y(1))(x^{(1)}, y^{(1)})(x(1),y(1))
  • compute FP for (x(2),y(2))(x^{(2)}, y^{(2)})(x(2),y(2)), Compute BP for (x(2),y(2))(x^{(2)}, y^{(2)})(x(2),y(2))
  • Finally Average (or sum) the gradients

2.1 ⏩ Forward Propagation

Set:

a(1):=x(t)a^{(1)} := x^{(t)}a(1):=x(t)

Compute forward propagation for:

l=2,3,…,Ll = 2, 3, \dots, Ll=2,3,…,L

to obtain activations a(l)a^{(l)}a(l) for any Network layer lll:

z(l)=Θ(l−1)a(l−1)z^{(l)} = \Theta^{(l-1)} a^{(l-1)}z(l)=Θ(l−1)a(l−1) a(l)=g(z(l))a^{(l)} = g(z^{(l)})a(l)=g(z(l))

Or when look Forward

z(l+1)=Θ(l)a(l)z^{(l+1)} = \Theta^{(l)} a^{(l)}z(l+1)=Θ(l)a(l) a(l+1)=g(z(l+1))a^{(l+1)} = g\left(z^{(l+1)}\right)a(l+1)=g(z(l+1))

Where

  • a(l)a^{(l)}a(l) = activations of layer lll
  • z(l)z^{(l)}z(l) = linear combination before activation
  • Θ(l)\Theta^{(l)}Θ(l) = weight matrix between layer lll and l+1l+1l+1
  • g(⋅)g(\cdot)g(⋅) = activation function

2.2 ❗Compute Output Layer Error (δ(L)\delta^{(L)}δ(L))

Using the true label y(t)y^{(t)}y(t):

δ(L)=a(L)−y(t)\delta^{(L)} = a^{(L)} - y^{(t)}δ(L)=a(L)−y(t)

This is the error of the output layer.

2.3 ⏪ Backpropagate the Errors δ(L−1),δ(L−2),…δ(2)\delta^{(L-1)}, \delta^{(L-2)}, \dots \delta^{(2)}δ(L−1),δ(L−2),…δ(2)

For layers:

l=L−1,L−2,…,2l = L-1, L-2, \dots, 2l=L−1,L−2,…,2

Compute:

δ(l)=((Θ(l))Tδ(l+1))  . ⁣∗  g′(z(l))\delta^{(l)} = \left( (\Theta^{(l)})^T \delta^{(l+1)} \right) \;.\!* \; g'(z^{(l)})δ(l)=((Θ(l))Tδ(l+1)).∗g′(z(l))

For sigmoid activation:

g′(z(l))=a(l)  . ⁣∗  (1−a(l))g'(z^{(l)}) = a^{(l)} \;.\!* \; (1 - a^{(l)})g′(z(l))=a(l).∗(1−a(l))

So equivalently:

δ(l)=((Θ(l))Tδ(l+1))  . ⁣∗  a(l)  . ⁣∗  (1−a(l))\delta^{(l)} = \left( (\Theta^{(l)})^T \delta^{(l+1)} \right) \;.\!* \; a^{(l)} \;.\!* \; (1 - a^{(l)})δ(l)=((Θ(l))Tδ(l+1)).∗a(l).∗(1−a(l))

The operator . ⁣∗.\!* .∗ denotes element-wise multiplication.

2.4 📥 Accumulate Gradients

Update:

Δi,j(l):=Δi,j(l)+aj(l)δi(l+1)\Delta_{i,j}^{(l)} := \Delta_{i,j}^{(l)}+ a_j^{(l)} \delta_i^{(l+1)}Δi,j(l)​:=Δi,j(l)​+aj(l)​δi(l+1)​

Vectorized form:

Δ(l):=Δ(l)+δ(l+1)(a(l))T\Delta^{(l)} := \Delta^{(l)}+ \delta^{(l+1)} (a^{(l)})^TΔ(l):=Δ(l)+δ(l+1)(a(l))T

Step 3: 🎢 Compute Gradients

After processing all training examples:

For j≠0j \ne 0j=0 (non-bias terms):

Di,j(l)=1m(Δi,j(l)+λΘi,j(l))D_{i,j}^{(l)} = \frac{1}{m} \left( \Delta_{i,j}^{(l)} + \lambda \Theta_{i,j}^{(l)} \right)Di,j(l)​=m1​(Δi,j(l)​+λΘi,j(l)​)

For bias terms (j=0j = 0j=0):

Di,j(l)=1mΔi,j(l)D_{i,j}^{(l)} = \frac{1}{m} \Delta_{i,j}^{(l)}Di,j(l)​=m1​Δi,j(l)​

Final Result

The gradient of the cost function is:

∂∂Θi,j(l)J(Θ)=Di,j(l)\frac{\partial}{\partial \Theta_{i,j}^{(l)}} J(\Theta) = D_{i,j}^{(l)}∂Θi,j(l)​∂​J(Θ)=Di,j(l)​

The matrix D(l)D^{(l)}D(l) gives the partial derivatives used in gradient descent.


Example:

Given one training example (x,y)(x, y)(x,y)

Layer 1 (Input)

⏩ Forward Propagation

a(1)=xa^{(1)} = xa(1)=x

⏪ Backward Propagation

No Error Term Associated with Input Term


Layer 2

⏩ Forward Propagation

z(2)=Θ(1)a(1)z^{(2)} = \Theta^{(1)} a^{(1)}z(2)=Θ(1)a(1) a(2)=g(z(2))a^{(2)} = g(z^{(2)})a(2)=g(z(2))

(Add bias unit if applicable.)

⏪ Backward Propagation

δ(2)=(Θ(2)Tδ(3))⊙g′(z(2))\delta^{(2)} = (\Theta^{(2)T} \delta^{(3)}) \odot g'(z^{(2)})δ(2)=(Θ(2)Tδ(3))⊙g′(z(2))

Layer 3

⏩ Forward Propagation

z(3)=Θ(2)a(2)z^{(3)} = \Theta^{(2)} a^{(2)}z(3)=Θ(2)a(2) a(3)=g(z(3))a^{(3)} = g(z^{(3)})a(3)=g(z(3))

⏪ Backward Propagation

δ(3)=(Θ(3)Tδ(4))⊙g′(z(3))\delta^{(3)} = (\Theta^{(3)T} \delta^{(4)}) \odot g'(z^{(3)})δ(3)=(Θ(3)Tδ(4))⊙g′(z(3))

Layer 4 (Output)

⏩ Forward Propagation

z(4)=Θ(3)a(3)z^{(4)} = \Theta^{(3)} a^{(3)}z(4)=Θ(3)a(3) a(4)=hΘ(x)=g(z(4))a^{(4)} = h_\Theta(x) = g(z^{(4)})a(4)=hΘ​(x)=g(z(4))

⏪ Backward Propagation

Output Layer Error = Calculated Value - Actual Value

δ(4)=a(4)−y\delta^{(4)} = a^{(4)} - yδ(4)=a(4)−y
AI-DeepLearning/7-Backward-Propogation
Let's work together
+49 176-2019-2523
hiteshkrsahu@gmail.com
WhatsApp
Skype
Munich 🥨, Germany 🇩🇪, EU
Playstore
Hitesh Sahu's apps on Google Play Store
Need Help?
Let's Connect
Navigation
  Home/About
  Skills
  Work/Projects
  Lab/Experiments
  Contribution
  Awards
  Art/Sketches
  Thoughts
  Contact
Links
  Sitemap
  Legal Notice
  Privacy Policy

Made with

NextJS logo

NextJS by

hitesh Sahu

| © 2026 All rights reserved.