Logistic Regression for Classification: Concept, Sigmoid Function, Cost Function, and Implementation

Complete guide to logistic regression for binary classification, including the sigmoid function, hypothesis model, cost function, decision boundary, gradient descent, and practical machine learning implementation.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Polynomial Regression

Logistic Regression for Classification: Concept, Sigmoid Function, Cost Function, and Implementation

📊 Logistic Regression Advanced Concepts

Logistic regression is fundamentally a probabilistic classification model optimized using cross-entropy loss.

Derivation of the Sigmoid Function

We want a function with these properties:

Output between 0 and 1
Smooth and differentiable
Monotonically increasing
Interpretable as probability

We start by modeling the log-odds (logit) as linear:

\log\left(\frac{p}{1-p}\right) = \theta^T x

Where:

$p = P(y=1 \mid x)$
$\frac{p}{1-p}$ is the odds
$\log\left(\frac{p}{1-p}\right)$ is the log-odds

Step 1: Remove the logarithm

Exponentiate both sides:

\frac{p}{1-p} = e^{\theta^T x}

Step 2: Solve for $p$

Multiply both sides by $(1-p)$ :

p = (1-p)e^{\theta^T x}

Expand:

p = e^{\theta^T x} - p e^{\theta^T x}

Move terms:

p + p e^{\theta^T x} = e^{\theta^T x}

Factor:

p(1 + e^{\theta^T x}) = e^{\theta^T x}

Solve:

p = \frac{e^{\theta^T x}}{1 + e^{\theta^T x}}

Rewrite:

p = \frac{1}{1 + e^{-\theta^T x}}

Final Result: Sigmoid Function

We model log-odds as linear:
$\log\left(\frac{p}{1-p}\right) = \theta^T x$
This leads to the sigmoid function: $\sigma(z) = \frac{1}{1 + e^{-z}}$

Advanced Optimization for Logistic Regression

Instead of using gradient descent, we can use more advanced optimization algorithms such as:

Conjugate Gradient
BFGS
L-BFGS

These methods are:

Faster
More sophisticated
Often require fewer iterations
Already implemented and highly optimized in libraries

You should not implement them yourself unless you are an expert in numerical optimization.

1. What We Need to Provide

Optimization libraries require a function that returns:

The cost function:
$J(\theta)$
The gradient:
$\frac{\partial}{\partial \theta_j} J(\theta)$

We can return both from one function.

2. Example Cost Function

function [jVal, gradient] = costFunction(theta)

  jVal = ... % code to compute J(theta)

  gradient = ... % code to compute gradient of J(theta)

end

Multiclass Classification: One-vs-All

1. The Problem

Previously, we had:

y \in \{0,1\}

Now suppose we have multiple classes:

y \in \{0,1,2,\dots,n\}

This is called multiclass classification.

One-vs-All Strategy

We solve the problem by turning it into multiple binary classification problems.

For each class $i$ , we train a logistic regression classifier:

h_\theta^{(i)}(x) = P(y = i \mid x; \theta)

So we train:

h_\theta^{(0)}(x), \; h_\theta^{(1)}(x), \; \dots, \; h_\theta^{(n)}(x)

Each classifier answers:

“Is this example class $i$ or not?”

All other classes are treated as the negative class.

Training Process

For each class $i$ :

Create new labels:
- Positive: $y = i$
- Negative: $y \ne i$
Train a logistic regression model.

This gives us $n+1$ classifiers.

Making Predictions

For a new input $x$ :

Compute:

h_\theta^{(0)}(x), \; h_\theta^{(1)}(x), \; \dots, \; h_\theta^{(n)}(x)

Predict the class with the highest probability:

\text{prediction} = \arg\max_i h_\theta^{(i)}(x)

Intuition

We:

Pick one class
Combine all other classes into a single group
Train a binary classifier
Repeat for each class

This is why it is called One-vs-All (or One-vs-Rest).

Example (3 Classes)

Suppose we have:

Class 0 - Animal
Class 1 - Fish
Class 2 - Bird

We train:

Classifier 1: 0 vs (1,2)
Classifier 2: 1 vs (0,2)
Classifier 3: 2 vs (0,1)

Then for prediction, we choose the class with the largest output.

Final Summary

Training:

Train $n+1$ logistic regression models:

h_\theta^{(i)}(x) = P(y=i \mid x; \theta)

Prediction:

\text{prediction} = \arg\max_i h_\theta^{(i)}(x)

Key Idea

One-vs-All turns a multiclass problem into multiple binary logistic regression problems and selects the class with the highest confidence.

Logistic Regression for Classification: Concept, Sigmoid Function, Cost Function, and Implementation

Complete guide to logistic regression for binary classification, including the sigmoid function, hypothesis model, cost function, decision boundary, gradient descent, and practical machine learning implementation.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Polynomial Regression

Logistic Regression for Classification: Concept, Sigmoid Function, Cost Function, and Implementation

📊 Logistic Regression Advanced Concepts

Logistic regression is fundamentally a probabilistic classification model optimized using cross-entropy loss.

Derivation of the Sigmoid Function

We want a function with these properties:

Output between 0 and 1
Smooth and differentiable
Monotonically increasing
Interpretable as probability

We start by modeling the log-odds (logit) as linear:

\log\left(\frac{p}{1-p}\right) = \theta^T x

Where:

$p = P(y=1 \mid x)$
$\frac{p}{1-p}$ is the odds
$\log\left(\frac{p}{1-p}\right)$ is the log-odds

Step 1: Remove the logarithm

Exponentiate both sides:

\frac{p}{1-p} = e^{\theta^T x}

Step 2: Solve for $p$

Multiply both sides by $(1-p)$ :

p = (1-p)e^{\theta^T x}

Expand:

p = e^{\theta^T x} - p e^{\theta^T x}

Move terms:

p + p e^{\theta^T x} = e^{\theta^T x}

Factor:

p(1 + e^{\theta^T x}) = e^{\theta^T x}

Solve:

p = \frac{e^{\theta^T x}}{1 + e^{\theta^T x}}

Rewrite:

p = \frac{1}{1 + e^{-\theta^T x}}

Final Result: Sigmoid Function

We model log-odds as linear:
$\log\left(\frac{p}{1-p}\right) = \theta^T x$
This leads to the sigmoid function: $\sigma(z) = \frac{1}{1 + e^{-z}}$

Advanced Optimization for Logistic Regression

Instead of using gradient descent, we can use more advanced optimization algorithms such as:

Conjugate Gradient
BFGS
L-BFGS

These methods are:

Faster
More sophisticated
Often require fewer iterations
Already implemented and highly optimized in libraries

You should not implement them yourself unless you are an expert in numerical optimization.

1. What We Need to Provide

Optimization libraries require a function that returns:

The cost function:
$J(\theta)$
The gradient:
$\frac{\partial}{\partial \theta_j} J(\theta)$

We can return both from one function.

2. Example Cost Function

function [jVal, gradient] = costFunction(theta)

  jVal = ... % code to compute J(theta)

  gradient = ... % code to compute gradient of J(theta)

end

Multiclass Classification: One-vs-All

1. The Problem

Previously, we had:

y \in \{0,1\}

Now suppose we have multiple classes:

y \in \{0,1,2,\dots,n\}

This is called multiclass classification.

One-vs-All Strategy

We solve the problem by turning it into multiple binary classification problems.

For each class $i$ , we train a logistic regression classifier:

h_\theta^{(i)}(x) = P(y = i \mid x; \theta)

So we train:

h_\theta^{(0)}(x), \; h_\theta^{(1)}(x), \; \dots, \; h_\theta^{(n)}(x)

Each classifier answers:

“Is this example class $i$ or not?”

All other classes are treated as the negative class.

Training Process

For each class $i$ :

Create new labels:
- Positive: $y = i$
- Negative: $y \ne i$
Train a logistic regression model.

This gives us $n+1$ classifiers.

Making Predictions

For a new input $x$ :

Compute:

h_\theta^{(0)}(x), \; h_\theta^{(1)}(x), \; \dots, \; h_\theta^{(n)}(x)

Predict the class with the highest probability:

\text{prediction} = \arg\max_i h_\theta^{(i)}(x)

Intuition

We:

Pick one class
Combine all other classes into a single group
Train a binary classifier
Repeat for each class

This is why it is called One-vs-All (or One-vs-Rest).

Example (3 Classes)

Suppose we have:

Class 0 - Animal
Class 1 - Fish
Class 2 - Bird

We train:

Classifier 1: 0 vs (1,2)
Classifier 2: 1 vs (0,2)
Classifier 3: 2 vs (0,1)

Then for prediction, we choose the class with the largest output.

Final Summary

Training:

Train $n+1$ logistic regression models:

h_\theta^{(i)}(x) = P(y=i \mid x; \theta)

Prediction:

\text{prediction} = \arg\max_i h_\theta^{(i)}(x)

Key Idea

One-vs-All turns a multiclass problem into multiple binary logistic regression problems and selects the class with the highest confidence.

Logistic Regression for Classification: Concept, Sigmoid Function, Cost Function, and Implementation

Complete guide to logistic regression for binary classification, including the sigmoid function, hypothesis model, cost function, decision boundary, gradient descent, and practical machine learning implementation.

Written by Hitesh Sahu, a passionate developer and blogger.

📊 Logistic Regression Advanced Concepts

Derivation of the Sigmoid Function

Step 1: Remove the logarithm

Step 2: Solve for ppp

Final Result: Sigmoid Function

Advanced Optimization for Logistic Regression

1. What We Need to Provide

2. Example Cost Function

Multiclass Classification: One-vs-All

1. The Problem

One-vs-All Strategy

Training Process

Making Predictions

Intuition

Example (3 Classes)

Final Summary

Training:

Prediction:

Key Idea

Fetching content, this won’t take long…

🦥 Sloths can hold their breath longer than dolphins 🐬.

Logistic Regression for Classification: Concept, Sigmoid Function, Cost Function, and Implementation

Complete guide to logistic regression for binary classification, including the sigmoid function, hypothesis model, cost function, decision boundary, gradient descent, and practical machine learning implementation.

Written by Hitesh Sahu, a passionate developer and blogger.

📊 Logistic Regression Advanced Concepts

Derivation of the Sigmoid Function

Step 1: Remove the logarithm

Step 2: Solve for ppp

Final Result: Sigmoid Function

Advanced Optimization for Logistic Regression

1. What We Need to Provide

2. Example Cost Function

Multiclass Classification: One-vs-All

1. The Problem

One-vs-All Strategy

Training Process

Making Predictions

Intuition

Example (3 Classes)

Final Summary

Training:

Prediction:

Key Idea

Step 2: Solve for $p$

Step 2: Solve for $p$