Hitesh Sahu
Hitesh SahuHitesh Sahu
  1. Home
  2. ›
  3. posts
  4. ›
  5. …

  6. ›
  7. Cost Function

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🦈 Sharks existed before trees 🌳.

🍪 This website uses cookies

No personal data is stored on our servers however third party tools Google Analytics cookies to measure traffic and improve your website experience. Learn more

Cover Image for Cost Function for Neural Networks

Cost Function for Neural Networks

The cost function for neural networks generalizes the logistic regression cost to multiple output units and includes regularization over all weights in the network. This post breaks down the cost function, explaining the double and triple summations, and provides intuition for how it works.

Hitesh Sahu
Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

Cost Function for Neural Networks

Notation

We define:

  • LLL = total number of layers in the network
  • sls_lsl​ = number of units (excluding bias unit) in layer lll
  • KKK = number of output units (classes)

For multiclass problems, the kthk^{th}kth output is written as:

(hΘ(x))k(h_\Theta(x))_k(hΘ​(x))k​

Recall: Logistic Regression Cost Function

The regularized logistic regression cost is:

J(θ)=−1m∑i=1m[y(i)log⁡(hθ(x(i)))+(1−y(i))log⁡(1−hθ(x(i)))]+λ2m∑j=1nθj2J(\theta) =- \frac{1}{m} \sum_{i=1}^{m} \left[ y^{(i)} \log(h_\theta(x^{(i)}))+ (1 - y^{(i)}) \log(1 - h_\theta(x^{(i)})) \right] + \frac{\lambda}{2m} \sum_{j=1}^{n} \theta_j^2J(θ)=−m1​i=1∑m​[y(i)log(hθ​(x(i)))+(1−y(i))log(1−hθ​(x(i)))]+2mλ​j=1∑n​θj2​

Neural Network Cost Function

For neural networks, the cost function generalizes to:

J(Θ)=−1m∑i=1m∑k=1K[yk(i)log⁡((hΘ(x(i)))k)+(1−yk(i))log⁡(1−(hΘ(x(i)))k)]+λ2m∑l=1L−1∑i=1sl∑j=1sl+1(Θj,i(l))2J(\Theta) =- \frac{1}{m} \sum_{i=1}^{m} \sum_{k=1}^{K} \left[ y_k^{(i)} \log((h_\Theta(x^{(i)}))_k) + (1 - y_k^{(i)}) \log(1 - (h_\Theta(x^{(i)}))_k) \right] + \frac{\lambda}{2m} \sum_{l=1}^{L-1} \sum_{i=1}^{s_l} \sum_{j=1}^{s_{l+1}} (\Theta^{(l)}_{j,i})^2J(Θ)=−m1​i=1∑m​k=1∑K​[yk(i)​log((hΘ​(x(i)))k​)+(1−yk(i)​)log(1−(hΘ​(x(i)))k​)]+2mλ​l=1∑L−1​i=1∑sl​​j=1∑sl+1​​(Θj,i(l)​)2

Key Ideas

Double Sum

The term

∑i=1m∑k=1K\sum_{i=1}^{m} \sum_{k=1}^{K}i=1∑m​k=1∑K​

means:

  • Loop over all training examples (iii)
  • Loop over all output units (kkk)
  • Compute logistic loss for each output
  • Sum them together

This is simply the total loss across all output neurons.


Triple Sum (Regularization)

The term

∑l=1L−1∑i=1sl∑j=1sl+1(Θj,i(l))2\sum_{l=1}^{L-1} \sum_{i=1}^{s_l} \sum_{j=1}^{s_{l+1}} (\Theta^{(l)}_{j,i})^2l=1∑L−1​i=1∑sl​​j=1∑sl+1​​(Θj,i(l)​)2

means:

  • Loop over all layers
  • Loop over all weights in each layer
  • Square every weight
  • Add them all together

Important:

  • Bias weights are not regularized.
  • The index iii here does not refer to training examples.
  • This term regularizes all parameters in the entire network.

Intuition

Neural network cost function =

  • Logistic regression loss applied to every output unit
  • Plus regularization over all weights in the network
AI-DeepLearning/Cost-Function
Let's work together
+49 176-2019-2523
hiteshkrsahu@gmail.com
WhatsApp
Skype
Munich 🥨, Germany 🇩🇪, EU
Playstore
Hitesh Sahu's apps on Google Play Store
Need Help?
Let's Connect
Navigation
  Home/About
  Skills
  Work/Projects
  Lab/Experiments
  Contribution
  Awards
  Art/Sketches
  Thoughts
  Contact
Links
  Sitemap
  Legal Notice
  Privacy Policy

Made with

NextJS logo

NextJS by

hitesh Sahu

| © 2026 All rights reserved.