Neural Networks Overview
The Feature Explosion Problem
Why We need Neural Networks?
Suppose we have as input features and we want to compute a hypothesis .
We can
For quadratic features
- Quadratic terms grow rougly as
so we will end up with 5000 additional features if we have 100 features.
For cubic features
-
Cubic terms grows as
-
So we will end up with 166,000 additional features if we have 100 features.
As the features become more complex, the number of parameters grows rapidly.
It becomes
- computationally expensive to compute the hypothesis with many features.
- Memory Heavy to store all the parameters.
- Prone to overfitting due to the large number of parameters.
In this case, we can use a neural network to compute the hypothesis more efficiently.
Practical Example: Image Recognition
Suppose we have a 100 × 100 pixel image as input.
- Each pixel is a feature, so we have
10,000features. - For RGB images, we have 3 color channels, so we have
30,000features.
If we want to compute a hypothesis with quadratic features, we would have on the order of 450 million features, which is computationally infeasible.
Conclusion
Polynomial logistic regression: Works for small
- Explodes combinatorially for large
- Computationally impossible for large feature sets (like images)
- We need a non-linear model that can capture complex relationships without explicitly generating all polynomial features.
Neural Networks as a Solution
Neural networks can compute complex hypotheses without explicitly generating all polynomial features.
Neurons as Computational Units
At a simple level, neurons are computational units.
They:
- dendrites(head): Take inputs
- Process them: apply weights and activation function
- axon(tail): Produce an output

Artificial Neurons
In artificial neural networks, we model neurons as mathematical functions.

In our machine learning model:
Inputs are features:
- is the bias unit, and it is always equal to 1
Neuron Does complex transformations on the input features:
Output is the hypothesis
where
And it can be rewritten as:
so the hypothesis can be expressed as:
is the activation function that introduces non-linearity into the model.
Activation Function
is the sigmoid activation function. for Hypothesis function:
Neural networks use Logistic (Sigmoid) as logistic regression:
For a single unit:
This function is called the sigmoid activation function.
In neural networks:
- Parameters are called weights
- Outputs of neurons are called activations
Neural Network Structure
A Artificial Neural Network is a computational graph with layers of artificial neurons.
A simple network looks like:
Complex networks have multiple layers:
graph LR
Input --> Hidden-Layer
Hidden-Layer--> Output
- Layer 1: Input layer: takes features and bias unit
- Layer 2: Hidden layer: computes intermediate activations
- Layer n: Output layer: gives us the final hypothesis
Where:
= activation of unit in layer
-
: first neuron in hidden layer
= weight matrix mapping layer to layer
Each Input Layer is densely connected to each Activation Function of next Layer:
Top Down
graph
subgraph Input Layer
x1(((x1)))
x2(((x2)))
x3(((x3)))
end
subgraph Hidden Layers
a1{a1}
a2{a2}
a3{a3}
end
subgraph Output_Layer
y(((hθx)))
end
x1 --> a1
x1 --> a2
x1 --> a3
x2 --> a1
x2 --> a2
x2 --> a3
x3 --> a1
x3 --> a2
x3 --> a3
a1 --> y
a2 --> y
a3 --> y
IN Left to Right
graph LR
subgraph Input Layer
x1(((x1)))
x2(((x2)))
x3(((x3)))
end
subgraph Hidden Layer
a1{a1}
a2{a2}
a3{a3}
end
subgraph Output Layer
y(((hθx)))
end
x1 --> a1
x1 --> a2
x1 --> a3
x2 --> a1
x2 --> a2
x2 --> a3
x3 --> a1
x3 --> a2
x3 --> a3
a1 --> y
a2 --> y
a3 --> y
Simplified:
graph LR
x1(((x1))) --> a1{a1}
x1 --> a2{a2}
x1 --> a3{a3}
x2(((x2))) --> a1
x2 --> a2
x2 --> a3
x3(((x3))) --> a1
x3 --> a2
x3 --> a3
a1 --> y(((hθx)))
a2 --> y
a3 --> y
Advance 4 Layer Neural Network:
graph LR
%% Input Layer
subgraph Input Layer
x1(((x1)))
x2(((x2)))
x3(((x3)))
end
%% Hidden Layer 1
subgraph Hidden Layer 1
a1{a1}
a2{a2}
a3{a3}
end
%% Hidden Layer 2
subgraph Hidden Layer 2
b1{b1}
b2{b2}
b3{b3}
end
%% Output Layer
subgraph Output Layer
y(((hθx)))
end
%% Connections: Input → Hidden 1
x1 --> a1
x1 --> a2
x1 --> a3
x2 --> a1
x2 --> a2
x2 --> a3
x3 --> a1
x3 --> a2
x3 --> a3
%% Connections: Hidden 1 → Hidden 2
a1 --> b1
a1 --> b2
a1 --> b3
a2 --> b1
a2 --> b2
a2 --> b3
a3 --> b1
a3 --> b2
a3 --> b3
%% Connections: Hidden 2 → Output
b1 --> y
b2 --> y
b3 --> y
Neural Network Hypothesis
Neural networks are simply multiple logistic regression units stacked together.
Each layer:
- Takes activations from previous layer
- Multiplies by weight matrix
- Applies sigmoid activation
- Passes result forward
Computing Hidden Layer Activations
Each hidden unit is computed as:
This means:
- We use a 3 × 4 matrix of weights
- Each row corresponds to one hidden unit
- Each row multiplies all input features (including bias)
Output Layer
The final hypothesis is:
So:
- Hidden layer outputs become inputs to the next layer
- Another weight matrix is applied
- Then the sigmoid function is applied again
Weight Matrix Dimensions
If:
- Layer has units
- Layer has units
Then:
Why the +1?
Because of the bias unit.
Important detail:
- Input side includes bias
- Output side does NOT include bias
