Backpropagation Intuition
Backpropagation is the algorithm used to compute the gradients of the cost function with respect to the parameters in a neural network. This post provides an intuitive understanding of how backpropagation works and why it is essential for training deep learning models.
Backpropagation Intuition
Important Corrections
- The output layer error term should be:
- The cost function term must include proper parentheses:
Simplified Cost (Binary Classification, No Regularization)
If we ignore multiclass and regularization, the cost for training example is:
What Is ?
Intuitively:
represents the error of unit in layer .
More formally:
So:
- is the derivative of the cost with respect to
- It measures how much that unit contributed to the error
- Larger magnitude → steeper slope → more incorrect
How Backpropagation Works
Backpropagation computes errors from right to left.
We start at the output layer:
Then propagate backward using:
For sigmoid activation:
So equivalently:
Geometric Interpretation
Think of the network as a graph:
- Nodes = neurons
- Edges = weights
- Errors flow backward through edges
To compute :
- Take all connections going forward from unit
- Multiply each weight by the corresponding
- Sum them up
This is simply the chain rule applied repeatedly.
Example: Computing a Hidden Layer Delta
To compute:
We sum over the next layer:
Another Example
To compute:
We sum contributions from the next layer:
Core Insight
Backpropagation is:
- Repeated application of the chain rule
- Error flowing from output to input
- Weighted by connection strengths
- Modulated by the activation derivative
In short:
Forward pass computes predictions.
Backward pass computes gradients.
