Loading ⏳

Fetching content, this won’t take long…

💡 Did you know?

🤯 Your stomach gets a new lining every 3–4 days.

Loading ⏳

Fetching content, this won’t take long…

💡 Did you know?

🦈 Sharks existed before trees 🌳.

AI-Machine-Learning

AI-AgenticAI

AI-DeepLearning

AI-GenAI

AI-Infrastructure

AI-Machine-Learning

AI-Math

AWS

Azure

Hobbies

kubernetes

Management

Programming

Terraform

Z_Appendix

0-root

AI-Machine-Learning

Logistic Regression for Classification: Concept, Sigmoid Function, Cost Function, and Implementation

Complete guide to logistic regression for binary classification, including the sigmoid function, hypothesis model, cost function, decision boundary, gradient descent, and practical machine learning implementation.

Logistic Regression

Classification

Machine Learning

Binary Classification

Supervised Learning

Sigmoid Function

← Previous

Logistic Regression for Classification: Concept, Sigmoid Function, Cost Function, and Implementation

Linear Regression Explained: Single Variable and Multivariate Models with Gradient Descent

📊 Logistic Regression Advanced Concepts

Logistic regression is fundamentally a probabilistic classification model optimized using cross-entropy loss.

Derivation of the Sigmoid Function

We want a function with these properties:

Output between 0 and 1
Smooth and differentiable
Monotonically increasing
Interpretable as probability

We start by modeling the log-odds (logit) as linear:

\log\left(\frac{p}{1-p}\right) = \theta^T x

Where:

$p = P(y=1 \mid x)$
$\frac{p}{1-p}$ is the odds
$\log\left(\frac{p}{1-p}\right)$ is the log-odds

Step 1: Remove the logarithm

Exponentiate both sides:

\frac{p}{1-p} = e^{\theta^T x}

Step 2: Solve for $p$

Multiply both sides by $(1-p)$ :

p = (1-p)e^{\theta^T x}

Expand:

p = e^{\theta^T x} - p e^{\theta^T x}

Move terms:

p + p e^{\theta^T x} = e^{\theta^T x}

Factor:

p(1 + e^{\theta^T x}) = e^{\theta^T x}

Solve:

p = \frac{e^{\theta^T x}}{1 + e^{\theta^T x}}

Rewrite:

p = \frac{1}{1 + e^{-\theta^T x}}

Final Result: Sigmoid Function

We model log-odds as linear:
$\log\left(\frac{p}{1-p}\right) = \theta^T x$
This leads to the sigmoid function: $\sigma(z) = \frac{1}{1 + e^{-z}}$

Advanced Optimization for Logistic Regression

Instead of using gradient descent, we can use more advanced optimization algorithms such as:

Conjugate Gradient
BFGS
L-BFGS

These methods are:

Faster
More sophisticated
Often require fewer iterations
Already implemented and highly optimized in libraries

You should not implement them yourself unless you are an expert in numerical optimization.

1. What We Need to Provide

Optimization libraries require a function that returns:

The cost function:
$J(\theta)$
The gradient:
$\frac{\partial}{\partial \theta_j} J(\theta)$

We can return both from one function.

2. Example Cost Function

function [jVal, gradient] = costFunction(theta)

  jVal = ... % code to compute J(theta)

  gradient = ... % code to compute gradient of J(theta)

end

Multiclass Classification: One-vs-All

1. The Problem

Previously, we had:

y \in \{0,1\}

Now suppose we have multiple classes:

y \in \{0,1,2,\dots,n\}

This is called multiclass classification.

One-vs-All Strategy

We solve the problem by turning it into multiple binary classification problems.

For each class $i$ , we train a logistic regression classifier:

h_\theta^{(i)}(x) = P(y = i \mid x; \theta)

So we train:

h_\theta^{(0)}(x), \; h_\theta^{(1)}(x), \; \dots, \; h_\theta^{(n)}(x)

Each classifier answers:

“Is this example class $i$ or not?”

All other classes are treated as the negative class.

Training Process

For each class $i$ :

Create new labels:
- Positive: $y = i$
- Negative: $y \ne i$
Train a logistic regression model.

This gives us $n+1$ classifiers.

Making Predictions

For a new input $x$ :

Compute:

h_\theta^{(0)}(x), \; h_\theta^{(1)}(x), \; \dots, \; h_\theta^{(n)}(x)

Predict the class with the highest probability:

\text{prediction} = \arg\max_i h_\theta^{(i)}(x)

Intuition

We:

Pick one class
Combine all other classes into a single group
Train a binary classifier
Repeat for each class

This is why it is called One-vs-All (or One-vs-Rest).

Example (3 Classes)

Suppose we have:

Class 0 - Animal
Class 1 - Fish
Class 2 - Bird

We train:

Classifier 1: 0 vs (1,2)
Classifier 2: 1 vs (0,2)
Classifier 3: 2 vs (0,1)

Then for prediction, we choose the class with the largest output.

Final Summary

Training:

Train $n+1$ logistic regression models:

h_\theta^{(i)}(x) = P(y=i \mid x; \theta)

Prediction:

\text{prediction} = \arg\max_i h_\theta^{(i)}(x)

Key Idea

One-vs-All turns a multiclass problem into multiple binary logistic regression problems and selects the class with the highest confidence.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Logistic Regression for Classification: Concept, Sigmoid Function, Cost Function, and Implementation

Linear Regression Explained: Single Variable and Multivariate Models with Gradient Descent

AI-Machine-Learning/3-1-Logistic-Regression-II

Loading ⏳

Fetching content, this won’t take long…

💡 Did you know?

🤯 Your stomach gets a new lining every 3–4 days.

AI-Machine-Learning

AI-AgenticAI

AI-DeepLearning

AI-GenAI

AI-Infrastructure

AI-Machine-Learning

AI-Math

AWS

Azure

Hobbies

kubernetes

Management

Programming

Terraform

Z_Appendix

0-root

AI-Machine-Learning

Logistic Regression for Classification: Concept, Sigmoid Function, Cost Function, and Implementation

Complete guide to logistic regression for binary classification, including the sigmoid function, hypothesis model, cost function, decision boundary, gradient descent, and practical machine learning implementation.

Logistic Regression

Classification

Machine Learning

Binary Classification

Supervised Learning

Sigmoid Function

← Previous

Logistic Regression for Classification: Concept, Sigmoid Function, Cost Function, and Implementation

Linear Regression Explained: Single Variable and Multivariate Models with Gradient Descent

📊 Logistic Regression Advanced Concepts

Logistic regression is fundamentally a probabilistic classification model optimized using cross-entropy loss.

Derivation of the Sigmoid Function

We want a function with these properties:

Output between 0 and 1
Smooth and differentiable
Monotonically increasing
Interpretable as probability

We start by modeling the log-odds (logit) as linear:

\log\left(\frac{p}{1-p}\right) = \theta^T x

Where:

$p = P(y=1 \mid x)$
$\frac{p}{1-p}$ is the odds
$\log\left(\frac{p}{1-p}\right)$ is the log-odds

Step 1: Remove the logarithm

Exponentiate both sides:

\frac{p}{1-p} = e^{\theta^T x}

Step 2: Solve for $p$

Multiply both sides by $(1-p)$ :

p = (1-p)e^{\theta^T x}

Expand:

p = e^{\theta^T x} - p e^{\theta^T x}

Move terms:

p + p e^{\theta^T x} = e^{\theta^T x}

Factor:

p(1 + e^{\theta^T x}) = e^{\theta^T x}

Solve:

p = \frac{e^{\theta^T x}}{1 + e^{\theta^T x}}

Rewrite:

p = \frac{1}{1 + e^{-\theta^T x}}

Final Result: Sigmoid Function

We model log-odds as linear:
$\log\left(\frac{p}{1-p}\right) = \theta^T x$
This leads to the sigmoid function: $\sigma(z) = \frac{1}{1 + e^{-z}}$

Advanced Optimization for Logistic Regression

Instead of using gradient descent, we can use more advanced optimization algorithms such as:

Conjugate Gradient
BFGS
L-BFGS

These methods are:

Faster
More sophisticated
Often require fewer iterations
Already implemented and highly optimized in libraries

You should not implement them yourself unless you are an expert in numerical optimization.

1. What We Need to Provide

Optimization libraries require a function that returns:

The cost function:
$J(\theta)$
The gradient:
$\frac{\partial}{\partial \theta_j} J(\theta)$

We can return both from one function.

2. Example Cost Function

function [jVal, gradient] = costFunction(theta)

  jVal = ... % code to compute J(theta)

  gradient = ... % code to compute gradient of J(theta)

end

Multiclass Classification: One-vs-All

1. The Problem

Previously, we had:

y \in \{0,1\}

Now suppose we have multiple classes:

y \in \{0,1,2,\dots,n\}

This is called multiclass classification.

One-vs-All Strategy

We solve the problem by turning it into multiple binary classification problems.

For each class $i$ , we train a logistic regression classifier:

h_\theta^{(i)}(x) = P(y = i \mid x; \theta)

So we train:

h_\theta^{(0)}(x), \; h_\theta^{(1)}(x), \; \dots, \; h_\theta^{(n)}(x)

Each classifier answers:

“Is this example class $i$ or not?”

All other classes are treated as the negative class.

Training Process

For each class $i$ :

Create new labels:
- Positive: $y = i$
- Negative: $y \ne i$
Train a logistic regression model.

This gives us $n+1$ classifiers.

Making Predictions

For a new input $x$ :

Compute:

h_\theta^{(0)}(x), \; h_\theta^{(1)}(x), \; \dots, \; h_\theta^{(n)}(x)

Predict the class with the highest probability:

\text{prediction} = \arg\max_i h_\theta^{(i)}(x)

Intuition

We:

Pick one class
Combine all other classes into a single group
Train a binary classifier
Repeat for each class

This is why it is called One-vs-All (or One-vs-Rest).

Example (3 Classes)

Suppose we have:

Class 0 - Animal
Class 1 - Fish
Class 2 - Bird

We train:

Classifier 1: 0 vs (1,2)
Classifier 2: 1 vs (0,2)
Classifier 3: 2 vs (0,1)

Then for prediction, we choose the class with the largest output.

Final Summary

Training:

Train $n+1$ logistic regression models:

h_\theta^{(i)}(x) = P(y=i \mid x; \theta)

Prediction:

\text{prediction} = \arg\max_i h_\theta^{(i)}(x)

Key Idea

One-vs-All turns a multiclass problem into multiple binary logistic regression problems and selects the class with the highest confidence.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Logistic Regression for Classification: Concept, Sigmoid Function, Cost Function, and Implementation

Linear Regression Explained: Single Variable and Multivariate Models with Gradient Descent

AI-Machine-Learning/3-1-Logistic-Regression-II

Fetching content, this won’t take long…

🤯 Your stomach gets a new lining every 3–4 days.

Fetching content, this won’t take long…

🦈 Sharks existed before trees 🌳.

AI-Machine-Learning

AI-AgenticAI

AI-DeepLearning

AI-GenAI

AI-Infrastructure

AI-Machine-Learning

AI-Math

AWS

Azure

Hobbies

kubernetes

Management

Programming

Terraform

Z_Appendix

0-root

Logistic Regression for Classification: Concept, Sigmoid Function, Cost Function, and Implementation

Complete guide to logistic regression for binary classification, including the sigmoid function, hypothesis model, cost function, decision boundary, gradient descent, and practical machine learning implementation.

📊 Logistic Regression Advanced Concepts

Derivation of the Sigmoid Function

Step 1: Remove the logarithm

Step 2: Solve for ppp

Final Result: Sigmoid Function

Advanced Optimization for Logistic Regression

1. What We Need to Provide

2. Example Cost Function

Multiclass Classification: One-vs-All

1. The Problem

One-vs-All Strategy

Training Process

Making Predictions

Intuition

Example (3 Classes)

Final Summary

Training:

Prediction:

Key Idea

Written by Hitesh Sahu, a passionate developer and blogger.

Fetching content, this won’t take long…

🤯 Your stomach gets a new lining every 3–4 days.

AI-Machine-Learning

AI-AgenticAI

AI-DeepLearning

AI-GenAI

AI-Infrastructure

AI-Machine-Learning

AI-Math

AWS

Azure

Hobbies

kubernetes

Management

Programming

Terraform

Z_Appendix

0-root

Logistic Regression for Classification: Concept, Sigmoid Function, Cost Function, and Implementation

Complete guide to logistic regression for binary classification, including the sigmoid function, hypothesis model, cost function, decision boundary, gradient descent, and practical machine learning implementation.

📊 Logistic Regression Advanced Concepts

Derivation of the Sigmoid Function

Step 1: Remove the logarithm

Step 2: Solve for ppp

Final Result: Sigmoid Function

Advanced Optimization for Logistic Regression

1. What We Need to Provide

2. Example Cost Function

Multiclass Classification: One-vs-All

1. The Problem

One-vs-All Strategy

Training Process

Making Predictions

Intuition

Example (3 Classes)

Final Summary

Training:

Prediction:

Step 2: Solve for $p$

Step 2: Solve for $p$