Backpropagation Algorithm

Backpropagation is the algorithm used to minimize the neural network cost function. It computes the gradients of the cost function with respect to the parameters, allowing us to perform gradient descent and update our model.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

Backpropagation Algorithm

Backpropagation is the algorithm used to minimize the neural network cost function.

Just like gradient descent in linear and logistic regression, our goal is:

\min_\Theta J(\Theta)

That is, we want to find parameters $\Theta$ that minimize the cost function.

Objective

We want to compute the partial derivatives:

\frac{\partial}{\partial \Theta_{i,j}^{(l)}} J(\Theta)

These derivatives are used in gradient descent to update the parameters.

Backpropagation Algorithm

Given training set:

\{(x^{(1)}, y^{(1)}), \dots, (x^{(m)}, y^{(m)})\}

Step 1: Initialize Accumulators

Set:

\Delta_{i,j}^{(l)} := 0

for all $l, i, j$ .

This creates matrices of zeros to accumulate gradients.

Step 2: For each training example $t = 1$ to $m$

2.1 Forward Propagation

Set:

a^{(1)} := x^{(t)}

Compute forward propagation for:

l = 2, 3, \dots, L

to obtain activations $a^{(l)}$ .

2.2 Compute Output Layer Error

Using the true label $y^{(t)}$ :

\delta^{(L)} = a^{(L)} - y^{(t)}

This is the error of the output layer.

2.3 Backpropagate the Error

For layers:

l = L-1, L-2, \dots, 2

Compute:

\delta^{(l)} = \left( (\Theta^{(l)})^T \delta^{(l+1)} \right) \;.\!* \; g'(z^{(l)})

For sigmoid activation:

g'(z^{(l)}) = a^{(l)} \;.\!* \; (1 - a^{(l)})

So equivalently:

\delta^{(l)} = \left( (\Theta^{(l)})^T \delta^{(l+1)} \right) \;.\!* \; a^{(l)} \;.\!* \; (1 - a^{(l)})

The operator $.\!*$ denotes element-wise multiplication.

2.4 Accumulate Gradients

Update:

\Delta_{i,j}^{(l)} := \Delta_{i,j}^{(l)} + a_j^{(l)} \delta_i^{(l+1)}

Vectorized form:

\Delta^{(l)} := \Delta^{(l)} + \delta^{(l+1)} (a^{(l)})^T

Step 3: Compute Gradients

After processing all training examples:

For $j \ne 0$ (non-bias terms):

D_{i,j}^{(l)} = \frac{1}{m} \left( \Delta_{i,j}^{(l)} + \lambda \Theta_{i,j}^{(l)} \right)

For bias terms ( $j = 0$ ):

D_{i,j}^{(l)} = \frac{1}{m} \Delta_{i,j}^{(l)}

Final Result

The gradient of the cost function is:

\frac{\partial}{\partial \Theta_{i,j}^{(l)}} J(\Theta) = D_{i,j}^{(l)}

The matrix $D^{(l)}$ gives the partial derivatives used in gradient descent.

Key Ideas

Forward propagation computes activations.
Backpropagation computes errors ( $\delta$ values).
Errors are propagated from right to left.
Gradients are accumulated in $\Delta$ .
Regularization is added for non-bias weights.
Finally, we divide by $m$ to obtain the average gradient.

Backpropagation Algorithm

Backpropagation is the algorithm used to minimize the neural network cost function. It computes the gradients of the cost function with respect to the parameters, allowing us to perform gradient descent and update our model.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

Backpropagation Algorithm

Backpropagation is the algorithm used to minimize the neural network cost function.

Just like gradient descent in linear and logistic regression, our goal is:

\min_\Theta J(\Theta)

That is, we want to find parameters $\Theta$ that minimize the cost function.

Objective

We want to compute the partial derivatives:

\frac{\partial}{\partial \Theta_{i,j}^{(l)}} J(\Theta)

These derivatives are used in gradient descent to update the parameters.

Backpropagation Algorithm

Given training set:

\{(x^{(1)}, y^{(1)}), \dots, (x^{(m)}, y^{(m)})\}

Step 1: Initialize Accumulators

Set:

\Delta_{i,j}^{(l)} := 0

for all $l, i, j$ .

This creates matrices of zeros to accumulate gradients.

Step 2: For each training example $t = 1$ to $m$

2.1 Forward Propagation

Set:

a^{(1)} := x^{(t)}

Compute forward propagation for:

l = 2, 3, \dots, L

to obtain activations $a^{(l)}$ .

2.2 Compute Output Layer Error

Using the true label $y^{(t)}$ :

\delta^{(L)} = a^{(L)} - y^{(t)}

This is the error of the output layer.

2.3 Backpropagate the Error

For layers:

l = L-1, L-2, \dots, 2

Compute:

\delta^{(l)} = \left( (\Theta^{(l)})^T \delta^{(l+1)} \right) \;.\!* \; g'(z^{(l)})

For sigmoid activation:

g'(z^{(l)}) = a^{(l)} \;.\!* \; (1 - a^{(l)})

So equivalently:

\delta^{(l)} = \left( (\Theta^{(l)})^T \delta^{(l+1)} \right) \;.\!* \; a^{(l)} \;.\!* \; (1 - a^{(l)})

The operator $.\!*$ denotes element-wise multiplication.

2.4 Accumulate Gradients

Update:

\Delta_{i,j}^{(l)} := \Delta_{i,j}^{(l)} + a_j^{(l)} \delta_i^{(l+1)}

Vectorized form:

\Delta^{(l)} := \Delta^{(l)} + \delta^{(l+1)} (a^{(l)})^T

Step 3: Compute Gradients

After processing all training examples:

For $j \ne 0$ (non-bias terms):

D_{i,j}^{(l)} = \frac{1}{m} \left( \Delta_{i,j}^{(l)} + \lambda \Theta_{i,j}^{(l)} \right)

For bias terms ( $j = 0$ ):

D_{i,j}^{(l)} = \frac{1}{m} \Delta_{i,j}^{(l)}

Final Result

The gradient of the cost function is:

\frac{\partial}{\partial \Theta_{i,j}^{(l)}} J(\Theta) = D_{i,j}^{(l)}

The matrix $D^{(l)}$ gives the partial derivatives used in gradient descent.

Key Ideas

Forward propagation computes activations.
Backpropagation computes errors ( $\delta$ values).
Errors are propagated from right to left.
Gradients are accumulated in $\Delta$ .
Regularization is added for non-bias weights.
Finally, we divide by $m$ to obtain the average gradient.

Backpropagation Algorithm

Backpropagation is the algorithm used to minimize the neural network cost function. It computes the gradients of the cost function with respect to the parameters, allowing us to perform gradient descent and update our model.

Written by Hitesh Sahu, a passionate developer and blogger.

Backpropagation Algorithm

Objective

Backpropagation Algorithm

Given training set:

Step 1: Initialize Accumulators

Step 2: For each training example $t = 1$ to $m$

2.1 Forward Propagation

2.2 Compute Output Layer Error

2.3 Backpropagate the Error

2.4 Accumulate Gradients

Step 3: Compute Gradients

Final Result

Key Ideas

Playstore

Fetching content, this won’t take long…

🍌 Bananas are berries, but strawberries are not.

Backpropagation Algorithm

Backpropagation is the algorithm used to minimize the neural network cost function. It computes the gradients of the cost function with respect to the parameters, allowing us to perform gradient descent and update our model.

Written by Hitesh Sahu, a passionate developer and blogger.

Backpropagation Algorithm

Objective

Backpropagation Algorithm

Given training set:

Step 1: Initialize Accumulators

Step 2: For each training example $t = 1$ to $m$

2.1 Forward Propagation

2.2 Compute Output Layer Error

2.3 Backpropagate the Error

2.4 Accumulate Gradients

Step 3: Compute Gradients

Final Result

Key Ideas

Playstore

Backpropagation Algorithm

Backpropagation is the algorithm used to minimize the neural network cost function. It computes the gradients of the cost function with respect to the parameters, allowing us to perform gradient descent and update our model.

Written by Hitesh Sahu, a passionate developer and blogger.

Backpropagation Algorithm

Objective

Backpropagation Algorithm

Given training set:

Step 1: Initialize Accumulators

Step 2: For each training example t=1t = 1t=1 to mmm

2.1 Forward Propagation

2.2 Compute Output Layer Error

2.3 Backpropagate the Error

2.4 Accumulate Gradients

Step 3: Compute Gradients

Final Result

Key Ideas

Fetching content, this won’t take long…

🍌 Bananas are berries, but strawberries are not.

Backpropagation Algorithm

Backpropagation is the algorithm used to minimize the neural network cost function. It computes the gradients of the cost function with respect to the parameters, allowing us to perform gradient descent and update our model.

Written by Hitesh Sahu, a passionate developer and blogger.

Backpropagation Algorithm

Objective

Backpropagation Algorithm

Given training set:

Step 1: Initialize Accumulators

Step 2: For each training example t=1t = 1t=1 to mmm

2.1 Forward Propagation

2.2 Compute Output Layer Error

2.3 Backpropagate the Error

2.4 Accumulate Gradients

Step 3: Compute Gradients

Final Result

Key Ideas

Step 2: For each training example $t = 1$ to $m$

Step 2: For each training example $t = 1$ to $m$