Training a Neural Network

In this post, we will put together all the pieces we've learned about neural networks to understand how to train a neural network effectively. We will cover the cost function, backpropagation, gradient checking, and random initialization, along with key intuitions for each step.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Neural Network Hypothesis and Intuition

Forward Propagation in Neural Networks

Training a Neural Network

Putting It Together

Now that we have covered forward propagation, backpropagation, and gradient checking, let’s combine everything into a complete training pipeline.

1. 🔀 Choose a Network Architecture

First, decide the structure of your neural network:

Number of layers $L$
Number of hidden units per layer $j$
Number of Outputs $y$

How to choose Network

Input layer size = dimension of feature vector $x^{(i)}$
Output layer size = number of output classes
Hidden units:
- More units usually perform better
- But increase computational cost
Default choice:
- Use 1 hidden layer
- If using multiple hidden layers, use the same number of units in each layer

2. 📚 Training a Neural Network

2.1 🎲 Randomly Initialize Weights

Initialize each $\Theta^{(l)}$ randomly (not to zero).

This breaks symmetry and allows learning.

2.2 ⏩ Forward Propagation (FP)

For each training example $x^{(i)}$ , compute:

h_\Theta(x^{(i)})

This gives the network’s prediction.

2.3 💰 Implement the Cost Function

Compute:

J(\Theta)

This includes:

Logistic loss over all output units
Regularization term

2.4 ⏪ Backpropagation (BP)

Use backpropagation to compute:

\frac{\partial}{\partial \Theta_{i,j}^{(l)}} J(\Theta)

This gives the gradients needed for optimization.

2.5 🎢 Gradient Checking

Use numerical approximation to verify backpropagation:

\frac{\partial}{\partial \Theta} J(\Theta) \approx \frac{J(\Theta + \epsilon) - J(\Theta - \epsilon)}{2\epsilon}

⚠️ Once verified:

Disable gradient checking
It is computationally expensive

2.6 ⚖️ Minimize the Cost Function

Use:

Gradient descent, or
A built-in optimization algorithm (e.g., advanced optimizers)

to minimize $J(\Theta)$ .

Training Loop

During training, we iterate over all examples:

for i = 1:m
    % Forward propagation
    % Compute activations a^(l)

    % Backpropagation
    % Compute delta terms d^(l) for l = 2,...,L
end

For each example:

Perform forward pass
Compute errors
Accumulate gradients

Final Insight

Neural network training is simply:

Forward propagation
Backpropagation
Gradient-based optimization

All of deep learning is built on this foundation.

Training a Neural Network

In this post, we will put together all the pieces we've learned about neural networks to understand how to train a neural network effectively. We will cover the cost function, backpropagation, gradient checking, and random initialization, along with key intuitions for each step.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Neural Network Hypothesis and Intuition

Forward Propagation in Neural Networks

Training a Neural Network

Putting It Together

Now that we have covered forward propagation, backpropagation, and gradient checking, let’s combine everything into a complete training pipeline.

1. 🔀 Choose a Network Architecture

First, decide the structure of your neural network:

Number of layers $L$
Number of hidden units per layer $j$
Number of Outputs $y$

How to choose Network

Input layer size = dimension of feature vector $x^{(i)}$
Output layer size = number of output classes
Hidden units:
- More units usually perform better
- But increase computational cost
Default choice:
- Use 1 hidden layer
- If using multiple hidden layers, use the same number of units in each layer

2. 📚 Training a Neural Network

2.1 🎲 Randomly Initialize Weights

Initialize each $\Theta^{(l)}$ randomly (not to zero).

This breaks symmetry and allows learning.

2.2 ⏩ Forward Propagation (FP)

For each training example $x^{(i)}$ , compute:

h_\Theta(x^{(i)})

This gives the network’s prediction.

2.3 💰 Implement the Cost Function

Compute:

J(\Theta)

This includes:

Logistic loss over all output units
Regularization term

2.4 ⏪ Backpropagation (BP)

Use backpropagation to compute:

\frac{\partial}{\partial \Theta_{i,j}^{(l)}} J(\Theta)

This gives the gradients needed for optimization.

2.5 🎢 Gradient Checking

Use numerical approximation to verify backpropagation:

\frac{\partial}{\partial \Theta} J(\Theta) \approx \frac{J(\Theta + \epsilon) - J(\Theta - \epsilon)}{2\epsilon}

⚠️ Once verified:

Disable gradient checking
It is computationally expensive

2.6 ⚖️ Minimize the Cost Function

Use:

Gradient descent, or
A built-in optimization algorithm (e.g., advanced optimizers)

to minimize $J(\Theta)$ .

Training Loop

During training, we iterate over all examples:

for i = 1:m
    % Forward propagation
    % Compute activations a^(l)

    % Backpropagation
    % Compute delta terms d^(l) for l = 2,...,L
end

For each example:

Perform forward pass
Compute errors
Accumulate gradients

Final Insight

Neural network training is simply:

Forward propagation
Backpropagation
Gradient-based optimization

All of deep learning is built on this foundation.

AI-DeepLearning

AI-DeepLearning Index

Deep Learning Path 🤖

Neural Network Hypothesis and Intuition

Forward Propagation in Neural Networks

Vectorized Neural Networks Model Representation

Examples and Intuitions I — Neural Networks as Logical Gates

Examples and Intuitions II — Building XNOR with a Hidden Layer

Multiclass Classification with Neural Networks

Cost Function for Neural Networks

Backpropagation Algorithm

Gradient Checking and Random Initialization

Training a Neural Network

Revision Cheat Sheet

Training a Neural Network

In this post, we will put together all the pieces we've learned about neural networks to understand how to train a neural network effectively. We will cover the cost function, backpropagation, gradient checking, and random initialization, along with key intuitions for each step.

Written by Hitesh Sahu, a passionate developer and blogger.

Training a Neural Network

Putting It Together

1. 🔀 Choose a Network Architecture

How to choose Network

2. 📚 Training a Neural Network

2.1 🎲 Randomly Initialize Weights

2.2 ⏩ Forward Propagation (FP)

2.3 💰 Implement the Cost Function

2.4 ⏪ Backpropagation (BP)

2.5 🎢 Gradient Checking

2.6 ⚖️ Minimize the Cost Function

Training Loop

Final Insight

Complete Neural Network Workflow

Fetching content, this won’t take long…

🤯 Your stomach gets a new lining every 3–4 days.

AI-DeepLearning

AI-DeepLearning Index

Deep Learning Path 🤖

Neural Network Hypothesis and Intuition

Forward Propagation in Neural Networks

Vectorized Neural Networks Model Representation

Examples and Intuitions I — Neural Networks as Logical Gates

Examples and Intuitions II — Building XNOR with a Hidden Layer

Multiclass Classification with Neural Networks

Cost Function for Neural Networks

Backpropagation Algorithm

Gradient Checking and Random Initialization

Training a Neural Network

Revision Cheat Sheet

Training a Neural Network

In this post, we will put together all the pieces we've learned about neural networks to understand how to train a neural network effectively. We will cover the cost function, backpropagation, gradient checking, and random initialization, along with key intuitions for each step.

Written by Hitesh Sahu, a passionate developer and blogger.

Training a Neural Network

Putting It Together

1. 🔀 Choose a Network Architecture

How to choose Network

2. 📚 Training a Neural Network

2.1 🎲 Randomly Initialize Weights

2.2 ⏩ Forward Propagation (FP)

2.3 💰 Implement the Cost Function

2.4 ⏪ Backpropagation (BP)

2.5 🎢 Gradient Checking

2.6 ⚖️ Minimize the Cost Function

Training Loop

Final Insight

Complete Neural Network Workflow