Multiclass Classification with Neural Networks

Learn how to extend binary classification to multiclass classification using neural networks, where the output layer consists of multiple units representing different classes, and the final prediction is made by selecting the class with the highest output value.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Examples and Intuitions II — Building XNOR with a Hidden Layer

Cost Function for Neural Networks

Multiclass Classification with Neural Networks

Extending Binary Classification

In binary classification, our hypothesis outputs a single value:

h_\Theta(x) \in [0,1]

For multiclass classification, instead of returning a single value,
our hypothesis returns a vector of probabilities.

Simplified Cost

If we ignore multiclass and regularization, the cost for training example $t$ is:

\text{cost}(t) = y^{(t)} \log(h_\Theta(x^{(t)})) + (1 - y^{(t)}) \log(1 - h_\Theta(x^{(t)}))

Example: Four-Class Classification

Suppose we want to classify an image into one of four categories:

🚗 Car
🏃 Pedestrian
🚚 Truck
🛵 Motorcycle

Instead of one output unit, we use four output units.

Network Structure

\text{Input Image} \rightarrow \text{Hidden Layers} \rightarrow \text{4 Output Units}

Each output unit corresponds to one class.


graph LR

%% Styling
    classDef input fill:#1e293b,stroke:#38bdf8,color:#ffffff,stroke-width:2px;
    classDef output fill:#111827,stroke:#f59e0b,color:#ffffff,stroke-width:2px;


%% ===== Input Layer =====
    subgraph "Input Layer"
        x0(((x0=1)))
        x1(((x1)))
        x2(((x2)))
        x3(((x3)))
    end

    %% ===== Hidden Layer =====
    subgraph "Hidden Layer"
        a0(((a0=1)))
        a1{a1}
        a2{a2}
        a3{a3}
    end

    %% ===== Output Layer =====
    subgraph "Output Layer (4 Classes)"
        y1(((hθx1 = 🚗)))
        y2(((hθx2 = 🏃‍)))
        y3(((hθx3 = 🚚)))
        y4(((hθx4 = 🛵)))
    end


    %% Input → Hidden
    x0 --> a1
    x0 --> a2
    x0 --> a3
    x1 --> a1
    x1 --> a2
    x1 --> a3
    x2 --> a1
    x2 --> a2
    x2 --> a3
    x3 --> a1
    x3 --> a2
    x3 --> a3

    %% Hidden → Output
    a0 --> y1
    a0 --> y2
    a0 --> y3
    a0 --> y4
    a1 --> y1
    a1 --> y2
    a1 --> y3
    a1 --> y4
    a2 --> y1
    a2 --> y2
    a2 --> y3
    a2 --> y4
    a3 --> y1
    a3 --> y2
    a3 --> y3
    a3 --> y4


%% Assign classes
class x1,x2,x3 input
class y1,y2,y3,y4 output

Output Representation

Our hypothesis now returns:

h_\Theta(x) = \begin{bmatrix} h_\Theta(x)_1 \\ h_\Theta(x)_2 \\ h_\Theta(x)_3 \\ h_\Theta(x)_4 \end{bmatrix}

Where:

$h_\Theta(x)_1$ → Probability of Car
$h_\Theta(x)_2$ → Probability of Pedestrian
$h_\Theta(x)_3$ → Probability of Truck
$h_\Theta(x)_4$ → Probability of Motorcycle

Training Labels (One-Hot Encoding)

Each training example has a label vector:

y^{(i)} \in \mathbb{R}^4

Examples:

Car:

\begin{bmatrix} 1 \\ 0 \\ 0 \\ 0 \end{bmatrix}

Motorcycle:

\begin{bmatrix} 0 \\ 0 \\ 0 \\ 1 \end{bmatrix}

This is called one-hot encoding.

Example Output

Suppose the network outputs:

h_\Theta(x) = \begin{bmatrix} 0 \\ 0 \\ 1 \\ 0 \end{bmatrix}

This means:

h_\Theta(x)_3 = 1

So the predicted class is the third category.

If we defined:

1 → Car
2 → Pedestrian
3 → Truck
4 → Motorcycle

Then the model predicts:

\text{Truck}

Decision Rule

In practice, we select:

\text{Prediction} = \arg\max_k h_\Theta(x)_k

That is, we choose the class with the largest output value.

Key Idea

Binary classification → 1 output unit
Multiclass classification → K output units
Output layer size = number of classes
Final prediction = index of largest output

Neural networks naturally extend logistic regression to multiple classes by simply increasing the number of output neurons.

Multiclass Classification with Neural Networks

Learn how to extend binary classification to multiclass classification using neural networks, where the output layer consists of multiple units representing different classes, and the final prediction is made by selecting the class with the highest output value.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Examples and Intuitions II — Building XNOR with a Hidden Layer

Cost Function for Neural Networks

Multiclass Classification with Neural Networks

Extending Binary Classification

In binary classification, our hypothesis outputs a single value:

h_\Theta(x) \in [0,1]

For multiclass classification, instead of returning a single value,
our hypothesis returns a vector of probabilities.

Simplified Cost

If we ignore multiclass and regularization, the cost for training example $t$ is:

\text{cost}(t) = y^{(t)} \log(h_\Theta(x^{(t)})) + (1 - y^{(t)}) \log(1 - h_\Theta(x^{(t)}))

Example: Four-Class Classification

Suppose we want to classify an image into one of four categories:

🚗 Car
🏃 Pedestrian
🚚 Truck
🛵 Motorcycle

Instead of one output unit, we use four output units.

Network Structure

\text{Input Image} \rightarrow \text{Hidden Layers} \rightarrow \text{4 Output Units}

Each output unit corresponds to one class.


graph LR

%% Styling
    classDef input fill:#1e293b,stroke:#38bdf8,color:#ffffff,stroke-width:2px;
    classDef output fill:#111827,stroke:#f59e0b,color:#ffffff,stroke-width:2px;


%% ===== Input Layer =====
    subgraph "Input Layer"
        x0(((x0=1)))
        x1(((x1)))
        x2(((x2)))
        x3(((x3)))
    end

    %% ===== Hidden Layer =====
    subgraph "Hidden Layer"
        a0(((a0=1)))
        a1{a1}
        a2{a2}
        a3{a3}
    end

    %% ===== Output Layer =====
    subgraph "Output Layer (4 Classes)"
        y1(((hθx1 = 🚗)))
        y2(((hθx2 = 🏃‍)))
        y3(((hθx3 = 🚚)))
        y4(((hθx4 = 🛵)))
    end


    %% Input → Hidden
    x0 --> a1
    x0 --> a2
    x0 --> a3
    x1 --> a1
    x1 --> a2
    x1 --> a3
    x2 --> a1
    x2 --> a2
    x2 --> a3
    x3 --> a1
    x3 --> a2
    x3 --> a3

    %% Hidden → Output
    a0 --> y1
    a0 --> y2
    a0 --> y3
    a0 --> y4
    a1 --> y1
    a1 --> y2
    a1 --> y3
    a1 --> y4
    a2 --> y1
    a2 --> y2
    a2 --> y3
    a2 --> y4
    a3 --> y1
    a3 --> y2
    a3 --> y3
    a3 --> y4


%% Assign classes
class x1,x2,x3 input
class y1,y2,y3,y4 output

Output Representation

Our hypothesis now returns:

h_\Theta(x) = \begin{bmatrix} h_\Theta(x)_1 \\ h_\Theta(x)_2 \\ h_\Theta(x)_3 \\ h_\Theta(x)_4 \end{bmatrix}

Where:

$h_\Theta(x)_1$ → Probability of Car
$h_\Theta(x)_2$ → Probability of Pedestrian
$h_\Theta(x)_3$ → Probability of Truck
$h_\Theta(x)_4$ → Probability of Motorcycle

Training Labels (One-Hot Encoding)

Each training example has a label vector:

y^{(i)} \in \mathbb{R}^4

Examples:

Car:

\begin{bmatrix} 1 \\ 0 \\ 0 \\ 0 \end{bmatrix}

Motorcycle:

\begin{bmatrix} 0 \\ 0 \\ 0 \\ 1 \end{bmatrix}

This is called one-hot encoding.

Example Output

Suppose the network outputs:

h_\Theta(x) = \begin{bmatrix} 0 \\ 0 \\ 1 \\ 0 \end{bmatrix}

This means:

h_\Theta(x)_3 = 1

So the predicted class is the third category.

If we defined:

1 → Car
2 → Pedestrian
3 → Truck
4 → Motorcycle

Then the model predicts:

\text{Truck}

Decision Rule

In practice, we select:

\text{Prediction} = \arg\max_k h_\Theta(x)_k

That is, we choose the class with the largest output value.

Key Idea

Binary classification → 1 output unit
Multiclass classification → K output units
Output layer size = number of classes
Final prediction = index of largest output

Neural networks naturally extend logistic regression to multiple classes by simply increasing the number of output neurons.

AI-DeepLearning

AI-DeepLearning Index

Deep Learning Path 🤖

Neural Network Hypothesis and Intuition

Forward Propagation in Neural Networks

Vectorized Neural Networks Model Representation

Examples and Intuitions I — Neural Networks as Logical Gates

Examples and Intuitions II — Building XNOR with a Hidden Layer

Multiclass Classification with Neural Networks

Cost Function for Neural Networks

Backpropagation Algorithm

Gradient Checking and Random Initialization

Training a Neural Network

Revision Cheat Sheet

Multiclass Classification with Neural Networks

Learn how to extend binary classification to multiclass classification using neural networks, where the output layer consists of multiple units representing different classes, and the final prediction is made by selecting the class with the highest output value.

Written by Hitesh Sahu, a passionate developer and blogger.

Multiclass Classification with Neural Networks

Extending Binary Classification

Simplified Cost

Example: Four-Class Classification

Network Structure

Output Representation

Training Labels (One-Hot Encoding)

Example Output

Decision Rule

Key Idea

Fetching content, this won’t take long…

🍌 Bananas are berries, but strawberries are not.

AI-DeepLearning

AI-DeepLearning Index

Deep Learning Path 🤖

Neural Network Hypothesis and Intuition

Forward Propagation in Neural Networks

Vectorized Neural Networks Model Representation

Examples and Intuitions I — Neural Networks as Logical Gates

Examples and Intuitions II — Building XNOR with a Hidden Layer

Multiclass Classification with Neural Networks

Cost Function for Neural Networks

Backpropagation Algorithm

Gradient Checking and Random Initialization

Training a Neural Network

Revision Cheat Sheet

Multiclass Classification with Neural Networks

Learn how to extend binary classification to multiclass classification using neural networks, where the output layer consists of multiple units representing different classes, and the final prediction is made by selecting the class with the highest output value.

Written by Hitesh Sahu, a passionate developer and blogger.

Multiclass Classification with Neural Networks

Extending Binary Classification

Simplified Cost

Example: Four-Class Classification

Network Structure

Output Representation

Training Labels (One-Hot Encoding)

Example Output

Decision Rule

Key Idea