Hitesh Sahu
Hitesh SahuHitesh Sahu
  1. Home
  2. ›
  3. posts
  4. ›
  5. …

  6. ›
  7. 5 Logistic Regression II

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🍯 Honey never spoils — archaeologists found 3,000-year-old jars still edible.

🍪 This website uses cookies

No personal data is stored on our servers however third party tools Google Analytics cookies to measure traffic and improve your website experience. Learn more

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🤯 Your stomach gets a new lining every 3–4 days.
Cover Image for Logistic Regression for Classification: Concept, Sigmoid Function, Cost Function, and Implementation

Logistic Regression for Classification: Concept, Sigmoid Function, Cost Function, and Implementation

Complete guide to logistic regression for binary classification, including the sigmoid function, hypothesis model, cost function, decision boundary, gradient descent, and practical machine learning implementation.

Hitesh Sahu
Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

📊 Logistic Regression Advanced Concepts

Logistic regression is fundamentally a probabilistic classification model optimized using cross-entropy loss.

Derivation of the Sigmoid Function

We want a function with these properties:

  1. Output between 0 and 1
  2. Smooth and differentiable
  3. Monotonically increasing
  4. Interpretable as probability

We start by modeling the log-odds (logit) as linear:

log⁡(p1−p)=θTx\log\left(\frac{p}{1-p}\right) = \theta^T xlog(1−pp​)=θTx

Where:

  • p=P(y=1∣x)p = P(y=1 \mid x)p=P(y=1∣x)
  • p1−p\frac{p}{1-p}1−pp​ is the odds
  • log⁡(p1−p)\log\left(\frac{p}{1-p}\right)log(1−pp​) is the log-odds

Step 1: Remove the logarithm

Exponentiate both sides:

p1−p=eθTx\frac{p}{1-p} = e^{\theta^T x}1−pp​=eθTx

Step 2: Solve for ppp

Multiply both sides by (1−p)(1-p)(1−p):

p=(1−p)eθTxp = (1-p)e^{\theta^T x}p=(1−p)eθTx

Expand:

p=eθTx−peθTxp = e^{\theta^T x} - p e^{\theta^T x}p=eθTx−peθTx

Move terms:

p+peθTx=eθTxp + p e^{\theta^T x} = e^{\theta^T x}p+peθTx=eθTx

Factor:

p(1+eθTx)=eθTxp(1 + e^{\theta^T x}) = e^{\theta^T x}p(1+eθTx)=eθTx

Solve:

p=eθTx1+eθTxp = \frac{e^{\theta^T x}}{1 + e^{\theta^T x}}p=1+eθTxeθTx​

Rewrite:

p=11+e−θTxp = \frac{1}{1 + e^{-\theta^T x}}p=1+e−θTx1​

Final Result: Sigmoid Function

  • We model log-odds as linear:
    log⁡(p1−p)=θTx\log\left(\frac{p}{1-p}\right) = \theta^T xlog(1−pp​)=θTx

  • This leads to the sigmoid function: σ(z)=11+e−z\sigma(z) = \frac{1}{1 + e^{-z}}σ(z)=1+e−z1​


Advanced Optimization for Logistic Regression

Instead of using gradient descent, we can use more advanced optimization algorithms such as:

  • Conjugate Gradient
  • BFGS
  • L-BFGS

These methods are:

  • Faster
  • More sophisticated
  • Often require fewer iterations
  • Already implemented and highly optimized in libraries

You should not implement them yourself unless you are an expert in numerical optimization.

1. What We Need to Provide

Optimization libraries require a function that returns:

  1. The cost function:

    J(θ)J(\theta)J(θ)
  2. The gradient:

    ∂∂θjJ(θ)\frac{\partial}{\partial \theta_j} J(\theta)∂θj​∂​J(θ)

We can return both from one function.

2. Example Cost Function

function [jVal, gradient] = costFunction(theta)

  jVal = ... % code to compute J(theta)

  gradient = ... % code to compute gradient of J(theta)

end


--

Multiclass Classification: One-vs-All

1. The Problem

Previously, we had:

y∈{0,1}y \in \{0,1\}y∈{0,1}

Now suppose we have multiple classes:

y∈{0,1,2,…,n}y \in \{0,1,2,\dots,n\}y∈{0,1,2,…,n}

This is called multiclass classification.


One-vs-All Strategy

We solve the problem by turning it into multiple binary classification problems.

For each class iii, we train a logistic regression classifier:

hθ(i)(x)=P(y=i∣x;θ)h_\theta^{(i)}(x) = P(y = i \mid x; \theta)hθ(i)​(x)=P(y=i∣x;θ)

So we train:

hθ(0)(x),  hθ(1)(x),  …,  hθ(n)(x)h_\theta^{(0)}(x), \; h_\theta^{(1)}(x), \; \dots, \; h_\theta^{(n)}(x)hθ(0)​(x),hθ(1)​(x),…,hθ(n)​(x)

Each classifier answers:

“Is this example class iii or not?”

All other classes are treated as the negative class.

Training Process

For each class iii:

  • Create new labels:
    • Positive: y=iy = iy=i
    • Negative: y≠iy \ne iy=i
  • Train a logistic regression model.

This gives us n+1n+1n+1 classifiers.

Making Predictions

For a new input xxx:

  1. Compute:
hθ(0)(x),  hθ(1)(x),  …,  hθ(n)(x)h_\theta^{(0)}(x), \; h_\theta^{(1)}(x), \; \dots, \; h_\theta^{(n)}(x)hθ(0)​(x),hθ(1)​(x),…,hθ(n)​(x)
  1. Predict the class with the highest probability:
prediction=arg⁡max⁡ihθ(i)(x)\text{prediction} = \arg\max_i h_\theta^{(i)}(x)prediction=argimax​hθ(i)​(x)

Intuition

We:

  • Pick one class
  • Combine all other classes into a single group
  • Train a binary classifier
  • Repeat for each class

This is why it is called One-vs-All (or One-vs-Rest).

Example (3 Classes)

Suppose we have:

  • Class 0 - Animal
  • Class 1 - Fish
  • Class 2 - Bird

We train:

  • Classifier 1: 0 vs (1,2)
  • Classifier 2: 1 vs (0,2)
  • Classifier 3: 2 vs (0,1)

Then for prediction, we choose the class with the largest output.

Final Summary

Training:

Train n+1n+1n+1 logistic regression models:

hθ(i)(x)=P(y=i∣x;θ)h_\theta^{(i)}(x) = P(y=i \mid x; \theta)hθ(i)​(x)=P(y=i∣x;θ)

Prediction:

prediction=arg⁡max⁡ihθ(i)(x)\text{prediction} = \arg\max_i h_\theta^{(i)}(x)prediction=argimax​hθ(i)​(x)

Key Idea

One-vs-All turns a multiclass problem into multiple binary logistic regression problems and selects the class with the highest confidence.

AI-Machine-Learning/5-Logistic-Regression-II
Let's work together
+49 176-2019-2523
hiteshkrsahu@gmail.com
WhatsApp
Skype
Munich 🥨, Germany 🇩🇪, EU
Playstore
Hitesh Sahu's apps on Google Play Store
Need Help?
Let's Connect
Navigation
  Home/About
  Skills
  Work/Projects
  Lab/Experiments
  Contribution
  Awards
  Art/Sketches
  Thoughts
  Contact
Links
  Sitemap
  Legal Notice
  Privacy Policy

Made with

NextJS logo

NextJS by

hitesh Sahu

| © 2026 All rights reserved.