Logistic Regression for Classification: Concept, Sigmoid Function, Cost Function, and Implementation
Complete guide to logistic regression for binary classification, including the sigmoid function, hypothesis model, cost function, decision boundary, gradient descent, and practical machine learning implementation.
📊 Logistic Regression Advanced Concepts
Logistic regression is fundamentally a probabilistic classification model optimized using cross-entropy loss.
Derivation of the Sigmoid Function
We want a function with these properties:
- Output between 0 and 1
- Smooth and differentiable
- Monotonically increasing
- Interpretable as probability
We start by modeling the log-odds (logit) as linear:
Where:
- is the odds
- is the log-odds
Step 1: Remove the logarithm
Exponentiate both sides:
Step 2: Solve for
Multiply both sides by :
Expand:
Move terms:
Factor:
Solve:
Rewrite:
Final Result: Sigmoid Function
-
We model log-odds as linear:
-
This leads to the sigmoid function:
Advanced Optimization for Logistic Regression
Instead of using gradient descent, we can use more advanced optimization algorithms such as:
- Conjugate Gradient
- BFGS
- L-BFGS
These methods are:
- Faster
- More sophisticated
- Often require fewer iterations
- Already implemented and highly optimized in libraries
You should not implement them yourself unless you are an expert in numerical optimization.
1. What We Need to Provide
Optimization libraries require a function that returns:
-
The cost function:
-
The gradient:
We can return both from one function.
2. Example Cost Function
function [jVal, gradient] = costFunction(theta)
jVal = ... % code to compute J(theta)
gradient = ... % code to compute gradient of J(theta)
end
--
Multiclass Classification: One-vs-All
1. The Problem
Previously, we had:
Now suppose we have multiple classes:
This is called multiclass classification.
One-vs-All Strategy
We solve the problem by turning it into multiple binary classification problems.
For each class , we train a logistic regression classifier:
So we train:
Each classifier answers:
“Is this example class or not?”
All other classes are treated as the negative class.
Training Process
For each class :
- Create new labels:
- Positive:
- Negative:
- Train a logistic regression model.
This gives us classifiers.
Making Predictions
For a new input :
- Compute:
- Predict the class with the highest probability:
Intuition
We:
- Pick one class
- Combine all other classes into a single group
- Train a binary classifier
- Repeat for each class
This is why it is called One-vs-All (or One-vs-Rest).
Example (3 Classes)
Suppose we have:
- Class 0 - Animal
- Class 1 - Fish
- Class 2 - Bird
We train:
- Classifier 1: 0 vs (1,2)
- Classifier 2: 1 vs (0,2)
- Classifier 3: 2 vs (0,1)
Then for prediction, we choose the class with the largest output.
Final Summary
Training:
Train logistic regression models:
Prediction:
Key Idea
One-vs-All turns a multiclass problem into multiple binary logistic regression problems and selects the class with the highest confidence.
