Hitesh Sahu
Hitesh SahuHitesh Sahu
  1. Home
  2. ›
  3. posts
  4. ›
  5. …

  6. ›
  7. 3 1 Logistic Regression II

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🍯 Honey never spoils — archaeologists found 3,000-year-old jars still edible.

🍪 This website uses cookies

No personal data is stored on our servers however third party tools Google Analytics cookies to measure traffic and improve your website experience. Learn more

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🦥 Sloths can hold their breath longer than dolphins 🐬.
AI-Machine-Learning

  • AI-Machine-Learning Index

  • Machine Learning Learning Path

  • Machine Learning: Introduction and Core Algorithms

  • Linear Regression Explained: Single Variable and Multivariate Models with Gradient Descent

  • Evaluating a Hypothesis in Neural Networks

  • Bias-Variance Dilemma

  • Cost Function Regularization: Balancing Bias and Variance in Machine Learning Models

  • Polynomial Regression

  • Normal Equation in Linear Regression: Formula, Intuition, and Comparison with Gradient Descent

  • Logistic Regression for Classification: Concept, Sigmoid Function, Cost Function, and Implementation

  • Logistic Regression for Classification: Concept, Sigmoid Function, Cost Function, and Implementation

  • Support Vector Machines (SVM): Maximizing Margins for Robust Machine Learning Models

  • XGBoost (Extreme Gradient Boosting) Explained

  • Dimensionality Reduction in Machine Learning

  • Principal Component Analysis (PCA) Explained

  • t-SNE (t-distributed Stochastic Neighbor Embedding) Explained

  • K-Means Clustering

  • Anomaly Detection: Identifying Rare and Unusual Patterns in Data

  • Anomaly Detection Using Gaussian Distribution in Machine Learning

  • Anomaly Detection Using Multivariate Gaussian Distribution

  • Recommender Systems: Collaborative Filtering, Content-Based Filtering, and Hybrid Approaches

  • Collaborative Filtering: Building Recommender Systems with Feature Learning

  • Anomaly Detection: Identifying Rare and Unusual Patterns in Data

  • Large Scale Machine Learning: Training Models on Massive Datasets

  • Stochastic Gradient Descent (SGD): Efficient Optimization for Large Datasets

  • MapReduce for Large-Scale Machine Learning: Distributed Training at Scale

Cover Image for Logistic Regression for Classification: Concept, Sigmoid Function, Cost Function, and Implementation

Logistic Regression for Classification: Concept, Sigmoid Function, Cost Function, and Implementation

Complete guide to logistic regression for binary classification, including the sigmoid function, hypothesis model, cost function, decision boundary, gradient descent, and practical machine learning implementation.

Hitesh Sahu
Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Logistic Regression for Classification: Concept, Sigmoid Function, Cost Function, and Implementation

Next →

Linear Regression Explained: Single Variable and Multivariate Models with Gradient Descent

📊 Logistic Regression Advanced Concepts

Logistic regression is fundamentally a probabilistic classification model optimized using cross-entropy loss.

Derivation of the Sigmoid Function

We want a function with these properties:

  1. Output between 0 and 1
  2. Smooth and differentiable
  3. Monotonically increasing
  4. Interpretable as probability

We start by modeling the log-odds (logit) as linear:

log⁡(p1−p)=θTx\log\left(\frac{p}{1-p}\right) = \theta^T xlog(1−pp​)=θTx

Where:

  • p=P(y=1∣x)p = P(y=1 \mid x)p=P(y=1∣x)
  • p1−p\frac{p}{1-p}1−pp​ is the odds
  • log⁡(p1−p)\log\left(\frac{p}{1-p}\right)log(1−pp​) is the log-odds

Step 1: Remove the logarithm

Exponentiate both sides:

p1−p=eθTx\frac{p}{1-p} = e^{\theta^T x}1−pp​=eθTx

Step 2: Solve for ppp

Multiply both sides by (1−p)(1-p)(1−p):

p=(1−p)eθTxp = (1-p)e^{\theta^T x}p=(1−p)eθTx

Expand:

p=eθTx−peθTxp = e^{\theta^T x} - p e^{\theta^T x}p=eθTx−peθTx

Move terms:

p+peθTx=eθTxp + p e^{\theta^T x} = e^{\theta^T x}p+peθTx=eθTx

Factor:

p(1+eθTx)=eθTxp(1 + e^{\theta^T x}) = e^{\theta^T x}p(1+eθTx)=eθTx

Solve:

p=eθTx1+eθTxp = \frac{e^{\theta^T x}}{1 + e^{\theta^T x}}p=1+eθTxeθTx​

Rewrite:

p=11+e−θTxp = \frac{1}{1 + e^{-\theta^T x}}p=1+e−θTx1​

Final Result: Sigmoid Function

  • We model log-odds as linear:
    log⁡(p1−p)=θTx\log\left(\frac{p}{1-p}\right) = \theta^T xlog(1−pp​)=θTx

  • This leads to the sigmoid function: σ(z)=11+e−z\sigma(z) = \frac{1}{1 + e^{-z}}σ(z)=1+e−z1​


Advanced Optimization for Logistic Regression

Instead of using gradient descent, we can use more advanced optimization algorithms such as:

  • Conjugate Gradient
  • BFGS
  • L-BFGS

These methods are:

  • Faster
  • More sophisticated
  • Often require fewer iterations
  • Already implemented and highly optimized in libraries

You should not implement them yourself unless you are an expert in numerical optimization.

1. What We Need to Provide

Optimization libraries require a function that returns:

  1. The cost function:

    J(θ)J(\theta)J(θ)
  2. The gradient:

    ∂∂θjJ(θ)\frac{\partial}{\partial \theta_j} J(\theta)∂θj​∂​J(θ)

We can return both from one function.

2. Example Cost Function

function [jVal, gradient] = costFunction(theta)

  jVal = ... % code to compute J(theta)

  gradient = ... % code to compute gradient of J(theta)

end


--

Multiclass Classification: One-vs-All

1. The Problem

Previously, we had:

y∈{0,1}y \in \{0,1\}y∈{0,1}

Now suppose we have multiple classes:

y∈{0,1,2,…,n}y \in \{0,1,2,\dots,n\}y∈{0,1,2,…,n}

This is called multiclass classification.


One-vs-All Strategy

We solve the problem by turning it into multiple binary classification problems.

For each class iii, we train a logistic regression classifier:

hθ(i)(x)=P(y=i∣x;θ)h_\theta^{(i)}(x) = P(y = i \mid x; \theta)hθ(i)​(x)=P(y=i∣x;θ)

So we train:

hθ(0)(x),  hθ(1)(x),  …,  hθ(n)(x)h_\theta^{(0)}(x), \; h_\theta^{(1)}(x), \; \dots, \; h_\theta^{(n)}(x)hθ(0)​(x),hθ(1)​(x),…,hθ(n)​(x)

Each classifier answers:

“Is this example class iii or not?”

All other classes are treated as the negative class.

Training Process

For each class iii:

  • Create new labels:
    • Positive: y=iy = iy=i
    • Negative: y≠iy \ne iy=i
  • Train a logistic regression model.

This gives us n+1n+1n+1 classifiers.

Making Predictions

For a new input xxx:

  1. Compute:
hθ(0)(x),  hθ(1)(x),  …,  hθ(n)(x)h_\theta^{(0)}(x), \; h_\theta^{(1)}(x), \; \dots, \; h_\theta^{(n)}(x)hθ(0)​(x),hθ(1)​(x),…,hθ(n)​(x)
  1. Predict the class with the highest probability:
prediction=arg⁡max⁡ihθ(i)(x)\text{prediction} = \arg\max_i h_\theta^{(i)}(x)prediction=argimax​hθ(i)​(x)

Intuition

We:

  • Pick one class
  • Combine all other classes into a single group
  • Train a binary classifier
  • Repeat for each class

This is why it is called One-vs-All (or One-vs-Rest).

Example (3 Classes)

Suppose we have:

  • Class 0 - Animal
  • Class 1 - Fish
  • Class 2 - Bird

We train:

  • Classifier 1: 0 vs (1,2)
  • Classifier 2: 1 vs (0,2)
  • Classifier 3: 2 vs (0,1)

Then for prediction, we choose the class with the largest output.

Final Summary

Training:

Train n+1n+1n+1 logistic regression models:

hθ(i)(x)=P(y=i∣x;θ)h_\theta^{(i)}(x) = P(y=i \mid x; \theta)hθ(i)​(x)=P(y=i∣x;θ)

Prediction:

prediction=arg⁡max⁡ihθ(i)(x)\text{prediction} = \arg\max_i h_\theta^{(i)}(x)prediction=argimax​hθ(i)​(x)

Key Idea

One-vs-All turns a multiclass problem into multiple binary logistic regression problems and selects the class with the highest confidence.

← Previous

Logistic Regression for Classification: Concept, Sigmoid Function, Cost Function, and Implementation

Next →

Linear Regression Explained: Single Variable and Multivariate Models with Gradient Descent

AI-Machine-Learning/3-1-Logistic-Regression-II
Let's work together
+49 176-2019-2523
hiteshkrsahu@gmail.com
WhatsApp
Skype
Munich 🥨, Germany 🇩🇪, EU
Playstore
Hitesh Sahu's apps on Google Play Store
Need Help?
Let's Connect
Navigation
  Home/About
  Skills
  Work/Projects
  Lab/Experiments
  Contribution
  Awards
  Art/Sketches
  Thoughts
  Contact
Links
  Sitemap
  Legal Notice
  Privacy Policy

Made with

NextJS logo

NextJS by

hitesh Sahu

| © 2026 All rights reserved.