Hitesh Sahu
Hitesh SahuHitesh Sahu
  1. Home
  2. ›
  3. posts
  4. ›
  5. …

  6. ›
  7. 6 Cost Function

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🍯 Honey never spoils — archaeologists found 3,000-year-old jars still edible.

🍪 This website uses cookies

No personal data is stored on our servers however third party tools Google Analytics cookies to measure traffic and improve your website experience. Learn more

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🍌 Bananas are berries, but strawberries are not.
Cover Image for Cost Function for Neural Networks

Cost Function for Neural Networks

The cost function for neural networks generalizes the logistic regression cost to multiple output units and includes regularization over all weights in the network. This post breaks down the cost function, explaining the double and triple summations, and provides intuition for how it works.

Hitesh Sahu
Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Multiclass Classification with Neural Networks

Next →

Backpropagation Algorithm

💰 Cost Function for Neural Networks

Notation


graph LR

%% Input Layer
    subgraph Input Layer
        x1(((x1)))
        x2(((x2)))
        x3(((x3)))
    end

%% Hidden Layer 1
    subgraph Hidden Layer 1
        a1{a1}
        a2{a2}
        a3{a3}
    end

%% Hidden Layer 2
    subgraph Hidden Layer 2
        b1{b1}
        b2{b2}
        b3{b3}
    end

%% Output Layer
    subgraph Output Layer
        y(((hθx)))
    end

%% Connections: Input → Hidden 1
    x1 --> a1
    x1 --> a2
    x1 --> a3
    x2 --> a1
    x2 --> a2
    x2 --> a3
    x3 --> a1
    x3 --> a2
    x3 --> a3
%% Connections: Hidden 1 → Hidden 2
    a1 --> b1
    a1 --> b2
    a1 --> b3
    a2 --> b1
    a2 --> b2
    a2 --> b3
    a3 --> b1
    a3 --> b2
    a3 --> b3
%% Connections: Hidden 2 → Output
    b1 --> y
    b2 --> y
    b3 --> y

Let:

  • mmm = number of training examples: (x1,y1),(x2,y2),…,(xm,ym){(x_1, y_1),(x_2, y_2), \dots, (x_m, y_m) }(x1​,y1​),(x2​,y2​),…,(xm​,ym​)
  • KKK = number of output units (classes) eg
    • Binary Classification: K = 1
    • Multi-class Classification: K = 4
  • LLL = total number of layers in the network eg L = 4
  • SlS_lSl​ = number of units (excluding bias unit) in layer lll
    • S1=3S_1= 3S1​=3, S2=3S_2=3S2​=3, S3=3S_3=3S3​=3, S4=1S_4=1S4​=1

Since neural networks can have multiple output nodes, we denote kthk^{th}kth output as:

(hΘ(x))k(h_\Theta(x))_k(hΘ​(x))k​

Neural Network Cost Function

Logistic Regression Cost Function

The regularized logistic regression cost is:

J(θ)=−1m∑i=1m[y(i)log⁡(hθ(x(i)))+(1−y(i))log⁡(1−hθ(x(i)))]+λ2m∑j=1nθj2J(\theta) =- \frac{1}{m} \sum_{i=1}^{m} \left[ y^{(i)} \log(h_\theta(x^{(i)}))+ (1 - y^{(i)}) \log(1 - h_\theta(x^{(i)})) \right] + \frac{\lambda}{2m} \sum_{j=1}^{n} \theta_j^2J(θ)=−m1​i=1∑m​[y(i)log(hθ​(x(i)))+(1−y(i))log(1−hθ​(x(i)))]+2mλ​j=1∑n​θj2​

For a neural network with KKK output units, the cost becomes:

J(Θ)=−1m∑i=1m∑k=1K[yk(i)log⁡((hΘ(x(i)))k)+(1−yk(i))log⁡(1−(hΘ(x(i)))k)]+λ2m∑l=1L−1∑i=1sl∑j=1sl+1(Θj,i(l))2J(\Theta) =- \frac{1}{m} \sum_{i=1}^{m} \sum_{k=1}^{K} \left[ y_k^{(i)} \log((h_\Theta(x^{(i)}))_k) + (1 - y_k^{(i)}) \log(1 - (h_\Theta(x^{(i)}))_k) \right] + \frac{\lambda}{2m} \sum_{l=1}^{L-1} \sum_{i=1}^{s_l} \sum_{j=1}^{s_{l+1}} (\Theta^{(l)}_{j,i})^2J(Θ)=−m1​i=1∑m​k=1∑K​[yk(i)​log((hΘ​(x(i)))k​)+(1−yk(i)​)log(1−(hΘ​(x(i)))k​)]+2mλ​l=1∑L−1​i=1∑sl​​j=1∑sl+1​​(Θj,i(l)​)2

Intuition

Neural network cost function =

  • Logistic regression loss applied to every output unit
  • Plus regularization over all weights in the network

This is simply a natural extension of logistic regression to multiple layers and multiple outputs.

In short:

Neural Network Cost=Multiclass Logistic Loss+Regularization\text{Neural Network Cost} = \text{Multiclass Logistic Loss}+ \text{Regularization}Neural Network Cost=Multiclass Logistic Loss+Regularization

Double Sum

The double sum adds up logistic regression losses across all output units.

∑i=1m∑k=1K\sum_{i=1}^{m} \sum_{k=1}^{K}i=1∑m​k=1∑K​
  • The outer sum loops over training examples (iii)
  • The inner sum over all output units (kkk)
  • We compute a logistic regression loss for each output node.
  • Then we add them together.

This is simply the total loss across all output neurons.

So this is essentially the sum of KKK logistic regression costs per example.

Triple Sum (Regularization)

The triple sum adds up squared weights across the entire network.

The term

∑l=1L−1∑i=1sl∑j=1sl+1(Θj,i(l))2\sum_{l=1}^{L-1} \sum_{i=1}^{s_l} \sum_{j=1}^{s_{l+1}} (\Theta^{(l)}_{j,i})^2l=1∑L−1​i=1∑sl​​j=1∑sl+1​​(Θj,i(l)​)2

means:

  • Loop over all layers
  • Loops over all units in layer lll.
  • Squares every weight in every Θ\ThetaΘ matrix.
  • Add them all together

Important:

  • Bias weights are not regularized.
  • The index iii here does not refer to training examples.
  • This term regularizes all parameters in the entire network.

AI-DeepLearning/6-Cost-Function
Let's work together
+49 176-2019-2523
hiteshkrsahu@gmail.com
WhatsApp
Skype
Munich 🥨, Germany 🇩🇪, EU
Playstore
Hitesh Sahu's apps on Google Play Store
Need Help?
Let's Connect
Navigation
  Home/About
  Skills
  Work/Projects
  Lab/Experiments
  Contribution
  Awards
  Art/Sketches
  Thoughts
  Contact
Links
  Sitemap
  Legal Notice
  Privacy Policy

Made with

NextJS logo

NextJS by

hitesh Sahu

| © 2026 All rights reserved.