Cost Function for Neural Networks
The cost function for neural networks generalizes the logistic regression cost to multiple output units and includes regularization over all weights in the network. This post breaks down the cost function, explaining the double and triple summations, and provides intuition for how it works.
💰 Cost Function for Neural Networks
Notation
graph LR
%% Input Layer
subgraph Input Layer
x1(((x1)))
x2(((x2)))
x3(((x3)))
end
%% Hidden Layer 1
subgraph Hidden Layer 1
a1{a1}
a2{a2}
a3{a3}
end
%% Hidden Layer 2
subgraph Hidden Layer 2
b1{b1}
b2{b2}
b3{b3}
end
%% Output Layer
subgraph Output Layer
y(((hθx)))
end
%% Connections: Input → Hidden 1
x1 --> a1
x1 --> a2
x1 --> a3
x2 --> a1
x2 --> a2
x2 --> a3
x3 --> a1
x3 --> a2
x3 --> a3
%% Connections: Hidden 1 → Hidden 2
a1 --> b1
a1 --> b2
a1 --> b3
a2 --> b1
a2 --> b2
a2 --> b3
a3 --> b1
a3 --> b2
a3 --> b3
%% Connections: Hidden 2 → Output
b1 --> y
b2 --> y
b3 --> y
Let:
- = number of training examples:
- = number of output units (classes) eg
- Binary Classification: K = 1
- Multi-class Classification: K = 4
- = total number of layers in the network eg L = 4
- = number of units (excluding bias unit) in layer
- , , ,
Since neural networks can have multiple output nodes, we denote output as:
Neural Network Cost Function
Logistic Regression Cost Function
The regularized logistic regression cost is:
For a neural network with output units, the cost becomes:
Intuition
Neural network cost function =
- Logistic regression loss applied to every output unit
- Plus regularization over all weights in the network
This is simply a natural extension of logistic regression to multiple layers and multiple outputs.
In short:
Double Sum
The double sum adds up logistic regression losses across all output units.
- The outer sum loops over training examples ()
- The inner sum over all output units ()
- We compute a logistic regression loss for each output node.
- Then we add them together.
This is simply the total loss across all output neurons.
So this is essentially the sum of logistic regression costs per example.
Triple Sum (Regularization)
The triple sum adds up squared weights across the entire network.
The term
means:
- Loop over all layers
- Loops over all units in layer .
- Squares every weight in every matrix.
- Add them all together
Important:
- Bias weights are not regularized.
- The index here does not refer to training examples.
- This term regularizes all parameters in the entire network.
