Cost Function for Neural Networks

The cost function for neural networks generalizes the logistic regression cost to multiple output units and includes regularization over all weights in the network. This post breaks down the cost function, explaining the double and triple summations, and provides intuition for how it works.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Multiclass Classification with Neural Networks

Backpropagation Algorithm

💰 Cost Function for Neural Networks

Notation


graph LR

%% Input Layer
    subgraph Input Layer
        x1(((x1)))
        x2(((x2)))
        x3(((x3)))
    end

%% Hidden Layer 1
    subgraph Hidden Layer 1
        a1{a1}
        a2{a2}
        a3{a3}
    end

%% Hidden Layer 2
    subgraph Hidden Layer 2
        b1{b1}
        b2{b2}
        b3{b3}
    end

%% Output Layer
    subgraph Output Layer
        y(((hθx)))
    end

%% Connections: Input → Hidden 1
    x1 --> a1
    x1 --> a2
    x1 --> a3
    x2 --> a1
    x2 --> a2
    x2 --> a3
    x3 --> a1
    x3 --> a2
    x3 --> a3
%% Connections: Hidden 1 → Hidden 2
    a1 --> b1
    a1 --> b2
    a1 --> b3
    a2 --> b1
    a2 --> b2
    a2 --> b3
    a3 --> b1
    a3 --> b2
    a3 --> b3
%% Connections: Hidden 2 → Output
    b1 --> y
    b2 --> y
    b3 --> y

Let:

$m$ = number of training examples: ${(x_1, y_1),(x_2, y_2), \dots, (x_m, y_m) }$
$K$ $K$ = number of output units (classes) eg
- Binary Classification: K = 1
- Multi-class Classification: K = 4
$L$ = total number of layers in the network eg L = 4
$S_l$ $S_{l}$ = number of units (excluding bias unit) in layer $l$ $l$
- $S_1= 3$ , $S_2=3$ , $S_3=3$ , $S_4=1$

Since neural networks can have multiple output nodes, we denote $k^{th}$ output as:

(h_\Theta(x))_k

Neural Network Cost Function

Logistic Regression Cost Function

The regularized logistic regression cost is:

J(\theta) =- \frac{1}{m} \sum_{i=1}^{m} \left[ y^{(i)} \log(h_\theta(x^{(i)}))+ (1 - y^{(i)}) \log(1 - h_\theta(x^{(i)})) \right] + \frac{\lambda}{2m} \sum_{j=1}^{n} \theta_j^2

For a neural network with $K$ output units, the cost becomes:

J(\Theta) =- \frac{1}{m} \sum_{i=1}^{m} \sum_{k=1}^{K} \left[ y_k^{(i)} \log((h_\Theta(x^{(i)}))_k) + (1 - y_k^{(i)}) \log(1 - (h_\Theta(x^{(i)}))_k) \right] + \frac{\lambda}{2m} \sum_{l=1}^{L-1} \sum_{i=1}^{s_l} \sum_{j=1}^{s_{l+1}} (\Theta^{(l)}_{j,i})^2

Intuition

Neural network cost function =

Logistic regression loss applied to every output unit
Plus regularization over all weights in the network

This is simply a natural extension of logistic regression to multiple layers and multiple outputs.

In short:

\text{Neural Network Cost} = \text{Multiclass Logistic Loss}+ \text{Regularization}

Double Sum

The double sum adds up logistic regression losses across all output units.

\sum_{i=1}^{m} \sum_{k=1}^{K}

The outer sum loops over training examples ( $i$ )
The inner sum over all output units ( $k$ )
We compute a logistic regression loss for each output node.
Then we add them together.

This is simply the total loss across all output neurons.

So this is essentially the sum of $K$ logistic regression costs per example.

Triple Sum (Regularization)

The triple sum adds up squared weights across the entire network.

The term

\sum_{l=1}^{L-1} \sum_{i=1}^{s_l} \sum_{j=1}^{s_{l+1}} (\Theta^{(l)}_{j,i})^2

means:

Loop over all layers
Loops over all units in layer $l$ .
Squares every weight in every $\Theta$ matrix.
Add them all together

Important:

Bias weights are not regularized.
The index $i$ here does not refer to training examples.
This term regularizes all parameters in the entire network.

Cost Function for Neural Networks

The cost function for neural networks generalizes the logistic regression cost to multiple output units and includes regularization over all weights in the network. This post breaks down the cost function, explaining the double and triple summations, and provides intuition for how it works.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Multiclass Classification with Neural Networks

Backpropagation Algorithm

💰 Cost Function for Neural Networks

Notation


graph LR

%% Input Layer
    subgraph Input Layer
        x1(((x1)))
        x2(((x2)))
        x3(((x3)))
    end

%% Hidden Layer 1
    subgraph Hidden Layer 1
        a1{a1}
        a2{a2}
        a3{a3}
    end

%% Hidden Layer 2
    subgraph Hidden Layer 2
        b1{b1}
        b2{b2}
        b3{b3}
    end

%% Output Layer
    subgraph Output Layer
        y(((hθx)))
    end

%% Connections: Input → Hidden 1
    x1 --> a1
    x1 --> a2
    x1 --> a3
    x2 --> a1
    x2 --> a2
    x2 --> a3
    x3 --> a1
    x3 --> a2
    x3 --> a3
%% Connections: Hidden 1 → Hidden 2
    a1 --> b1
    a1 --> b2
    a1 --> b3
    a2 --> b1
    a2 --> b2
    a2 --> b3
    a3 --> b1
    a3 --> b2
    a3 --> b3
%% Connections: Hidden 2 → Output
    b1 --> y
    b2 --> y
    b3 --> y

Let:

$m$ = number of training examples: ${(x_1, y_1),(x_2, y_2), \dots, (x_m, y_m) }$
$K$ $K$ = number of output units (classes) eg
- Binary Classification: K = 1
- Multi-class Classification: K = 4
$L$ = total number of layers in the network eg L = 4
$S_l$ $S_{l}$ = number of units (excluding bias unit) in layer $l$ $l$
- $S_1= 3$ , $S_2=3$ , $S_3=3$ , $S_4=1$

Since neural networks can have multiple output nodes, we denote $k^{th}$ output as:

(h_\Theta(x))_k

Neural Network Cost Function

Logistic Regression Cost Function

The regularized logistic regression cost is:

J(\theta) =- \frac{1}{m} \sum_{i=1}^{m} \left[ y^{(i)} \log(h_\theta(x^{(i)}))+ (1 - y^{(i)}) \log(1 - h_\theta(x^{(i)})) \right] + \frac{\lambda}{2m} \sum_{j=1}^{n} \theta_j^2

For a neural network with $K$ output units, the cost becomes:

J(\Theta) =- \frac{1}{m} \sum_{i=1}^{m} \sum_{k=1}^{K} \left[ y_k^{(i)} \log((h_\Theta(x^{(i)}))_k) + (1 - y_k^{(i)}) \log(1 - (h_\Theta(x^{(i)}))_k) \right] + \frac{\lambda}{2m} \sum_{l=1}^{L-1} \sum_{i=1}^{s_l} \sum_{j=1}^{s_{l+1}} (\Theta^{(l)}_{j,i})^2

Intuition

Neural network cost function =

Logistic regression loss applied to every output unit
Plus regularization over all weights in the network

This is simply a natural extension of logistic regression to multiple layers and multiple outputs.

In short:

\text{Neural Network Cost} = \text{Multiclass Logistic Loss}+ \text{Regularization}

Double Sum

The double sum adds up logistic regression losses across all output units.

\sum_{i=1}^{m} \sum_{k=1}^{K}

The outer sum loops over training examples ( $i$ )
The inner sum over all output units ( $k$ )
We compute a logistic regression loss for each output node.
Then we add them together.

This is simply the total loss across all output neurons.

So this is essentially the sum of $K$ logistic regression costs per example.

Triple Sum (Regularization)

The triple sum adds up squared weights across the entire network.

The term

\sum_{l=1}^{L-1} \sum_{i=1}^{s_l} \sum_{j=1}^{s_{l+1}} (\Theta^{(l)}_{j,i})^2

means:

Loop over all layers
Loops over all units in layer $l$ .
Squares every weight in every $\Theta$ matrix.
Add them all together

Important:

Bias weights are not regularized.
The index $i$ here does not refer to training examples.
This term regularizes all parameters in the entire network.

AI-DeepLearning

AI-DeepLearning Index

Deep Learning Path 🤖

Neural Network Hypothesis and Intuition

Forward Propagation in Neural Networks

Vectorized Neural Networks Model Representation

Examples and Intuitions I — Neural Networks as Logical Gates

Examples and Intuitions II — Building XNOR with a Hidden Layer

Multiclass Classification with Neural Networks

Cost Function for Neural Networks

Backpropagation Algorithm

Gradient Checking and Random Initialization

Training a Neural Network

Revision Cheat Sheet

Cost Function for Neural Networks

The cost function for neural networks generalizes the logistic regression cost to multiple output units and includes regularization over all weights in the network. This post breaks down the cost function, explaining the double and triple summations, and provides intuition for how it works.

Written by Hitesh Sahu, a passionate developer and blogger.

💰 Cost Function for Neural Networks

Notation

Neural Network Cost Function

Logistic Regression Cost Function

Intuition

Double Sum

Triple Sum (Regularization)

Fetching content, this won’t take long…

🦈 Sharks existed before trees 🌳.

AI-DeepLearning

AI-DeepLearning Index

Deep Learning Path 🤖

Neural Network Hypothesis and Intuition

Forward Propagation in Neural Networks

Vectorized Neural Networks Model Representation

Examples and Intuitions I — Neural Networks as Logical Gates

Examples and Intuitions II — Building XNOR with a Hidden Layer

Multiclass Classification with Neural Networks

Cost Function for Neural Networks

Backpropagation Algorithm

Gradient Checking and Random Initialization

Training a Neural Network

Revision Cheat Sheet

Cost Function for Neural Networks

The cost function for neural networks generalizes the logistic regression cost to multiple output units and includes regularization over all weights in the network. This post breaks down the cost function, explaining the double and triple summations, and provides intuition for how it works.

Written by Hitesh Sahu, a passionate developer and blogger.

💰 Cost Function for Neural Networks

Notation

Neural Network Cost Function

Logistic Regression Cost Function

Intuition

Double Sum

Triple Sum (Regularization)