Training a Neural Network
In this post, we will put together all the pieces we've learned about neural networks to understand how to train a neural network effectively. We will cover the cost function, backpropagation, gradient checking, and random initialization, along with key intuitions for each step.
Training a Neural Network
Putting It Together
Now that we have covered forward propagation, backpropagation, and gradient checking, let’s combine everything into a complete training pipeline.
1. 🔀 Choose a Network Architecture
First, decide the structure of your neural network:
- Number of layers
- Number of hidden units per layer
- Number of Outputs
How to choose Network
- Input layer size = dimension of feature vector
- Output layer size = number of output classes
- Hidden units:
- More units usually perform better
- But increase computational cost
- Default choice:
- Use 1 hidden layer
- If using multiple hidden layers, use the same number of units in each layer
2. 📚 Training a Neural Network
2.1 🎲 Randomly Initialize Weights
Initialize each randomly (not to zero).
This breaks symmetry and allows learning.
2.2 ⏩ Forward Propagation (FP)
For each training example , compute:
This gives the network’s prediction.
2.3 💰 Implement the Cost Function
Compute:
This includes:
- Logistic loss over all output units
- Regularization term
2.4 ⏪ Backpropagation (BP)
Use backpropagation to compute:
This gives the gradients needed for optimization.
2.5 🎢 Gradient Checking
Use numerical approximation to verify backpropagation:
⚠️ Once verified:
- Disable gradient checking
- It is computationally expensive
2.6 ⚖️ Minimize the Cost Function
Use:
- Gradient descent, or
- A built-in optimization algorithm (e.g., advanced optimizers)
to minimize .
Training Loop
During training, we iterate over all examples:
for i = 1:m
% Forward propagation
% Compute activations a^(l)
% Backpropagation
% Compute delta terms d^(l) for l = 2,...,L
end
For each example:
- Perform forward pass
- Compute errors
- Accumulate gradients
Final Insight
Neural network training is simply:
- Forward propagation
- Backpropagation
- Gradient-based optimization
All of deep learning is built on this foundation.
Complete Neural Network Workflow
- Choose architecture
- Initialize weights randomly
- Implement forward propagation
- Implement cost function
- Implement backpropagation
- Perform gradient checking
- Optimize using gradient descent
- Train until convergence
