Loading ⏳

Fetching content, this won’t take long…

💡 Did you know?

🐙 Octopuses have three hearts and blue blood.

Loading ⏳

Fetching content, this won’t take long…

💡 Did you know?

🍌 Bananas are berries, but strawberries are not.

AI-DeepLearning

AI-AgenticAI

AI-DeepLearning

AI-GenAI

AI-Infrastructure

AI-Machine-Learning

AI-Math

AWS

Azure

Hobbies

kubernetes

Management

Programming

Terraform

Z_Appendix

0-root

AI-DeepLearning

Vectorized Neural Networks Model Representation

Learn how to represent neural networks in a vectorized form, transforming scalar equations into efficient matrix operations for scalable and optimized computations.

Data Science

Machine Learning

Deep Learning

Neural Networks

Artificial Intelligence

Computational Graphs

← Previous

Forward Propagation in Neural Networks

Examples and Intuitions I — Neural Networks as Logical Gates

Forward Propagation

For any layer $j$ :

Linear Step

Calculate pre activation term

z^{(j)} = \Theta^{(j-1)} a^{(j-1)}

Activation Step

Apply activation

a^{(j)} = g(z^{(j)})

This process is repeated until the output layer.

From Scalar Equations to Vector Form


graph LR

%% Input Layer
    subgraph Input Layer
        x1(((x1)))
        x2(((x2)))
        x3(((x3)))
    end

%% Hidden Layer 1
    subgraph Hidden Layer 1
        a1{a1}
        a2{a2}
        a3{a3}
    end



%% Output Layer
    subgraph Output Layer
        y(((hθx)))
    end

%% Connections: Input → Hidden 1
    x1 --> a1
    x1 --> a2
    x1 --> a3
    
    x2 --> a1
    x2 --> a2
    x2 --> a3
    
    x3 --> a1
    x3 --> a2
    x3 --> a3

%% Connections: Hidden 2 → Output
    a1 --> y
    a2 --> y
    a3 --> y

Previously, we wrote each neuron separately.

For the hidden layer:

a^{(2)}_1 = g\left(\Theta^{(1)}_{10}x_0 + \Theta^{(1)}_{11}x_1 + \Theta^{(1)}_{12}x_2 + \Theta^{(1)}_{13}x_3\right)

a^{(2)}_2 = g\left(\Theta^{(1)}_{20}x_0 + \Theta^{(1)}_{21}x_1 + \Theta^{(1)}_{22}x_2 + \Theta^{(1)}_{23}x_3\right)

a^{(2)}_3 = g\left(\Theta^{(1)}_{30}x_0 + \Theta^{(1)}_{31}x_1 + \Theta^{(1)}_{32}x_2 + \Theta^{(1)}_{33}x_3\right)

Where

Superscript $a^{(2)}$ indicates layer 2 (hidden layer)
$g(\cdot)$ is the sigmoid function

Final hypothesis is:

h_\Theta(x) = a^{(3)}_1

Where

a^{(3)}_1 = g\left(\Theta^{(2)}_{10}a^{(2)}_0 + \Theta^{(2)}_{11}a^{(2)}_1 + \Theta^{(2)}_{12}a^{(2)}_2 + \Theta^{(2)}_{13}a^{(2)}_3\right)

It does not scale. So we need to vectorize it for more complex use cases.

Pre Activation Term $z_k^{(j)}$

Intermediate Variable contain the weighted sum before activation:

Suppose

z_1^{(2)} = \Theta^{(1)}_{10}x_0 + \Theta^{(1)}_{11}x_1 + \Theta^{(1)}_{12}x_2 + \Theta^{(1)}_{13}x_3

z_2^{(2)} = \Theta^{(1)}_{20}x_0 + \Theta^{(1)}_{21}x_1 + \Theta^{(1)}_{22}x_2 + \Theta^{(1)}_{23}x_3

z_3^{(2)} = \Theta^{(1)}_{30}x_0 + \Theta^{(1)}_{31}x_1 + \Theta^{(1)}_{32}x_2 + \Theta^{(1)}_{33}x_3

🧠 Generalized preactivation term

z^{(j)}_k = \Theta^{(j-1)}_{k,0} a^{(j-1)}_0 + \Theta^{(j-1)}_{k,1} a^{(j-1)}_1 + \dots + \Theta^{(j-1)}_{k,n} a^{(j-1)}_n

Then

a^{(2)}_1 = g(z_1^{(2)})

a^{(2)}_2 = g(z_2^{(2)})

a^{(2)}_3 = g(z_3^{(2)})

🧠 Generalized Activation

a^{(j)}_k = g\left(z^{(j)}_k\right)

This separates:

Linear computation
Nonlinear activation

Vector Representation

Input layer:

x = \begin{bmatrix} x_0 \\ x_1 \\ \vdots \\ x_n \end{bmatrix}

Where $x_0$ = 1

Let

a^{(1)} = x

Weighted sum vector:

z^{(j)} = \begin{bmatrix} z^{(j)}_1 \\ z^{(j)}_2 \\ \vdots \\ z^{(j)}_{s_j} \end{bmatrix}

Where:

s_j = \text{number of units in layer } j

We can calculate $z$ as

z^{(2)} = \Theta^{(1)}x

Since x = $a^{(1)}$ , so we can rewrite it:

z^{(2)} = \Theta^{(1)} a^{(1)}

🧠 Generalized Vectorized Preactivation term:

z^{(j)} = \Theta^{(j-1)} a^{(j-1)}

Where Vector Dimensions:

\Theta^{(j-1)} \in \mathbb{R}^{s_j \times (n+1)}

a^{(j-1)} \in \mathbb{R}^{(n+1) \times 1}

z^{(j)} \in \mathbb{R}^{s_j \times 1}

Activation Function

Since

a^{(2)}_2 = g(z_2^{(2)})

Generalized Activation Function:

a^{(j)} = g\left(z^{(j)}\right)

If using sigmoid:

g(z) = \frac{1}{1 + e^{-z}}

Add Bias Unit

After computing $a^{(2)}$ , add:

a_0^{(2)} = 1

Now:

a^{(j)} = \begin{bmatrix} 1 \\ a^{(j)}_1 \\ \vdots \\ a^{(j)}_{s_j} \end{bmatrix}

Output Layer

Repeat the same process:

calculate Linear Term z

z^{(j+1)} = \Theta^{(j)} a^{(j)}

Apply activation sigmoid of z

a^{(j+1)} = g\left(z^{(j+1)}\right)

Final hypothesis:

h_\Theta(x) = a^{(3)} = g(z^{(3)})

🧠 Generalized Hypothesis

h_\Theta(x) = a^{(j+1)} = g\left(z^{(j+1)}\right)

The Big Picture

Each layer performs:

\text{Linear transformation} \quad z = \Theta a

followed by

\text{Nonlinearity} \quad a = g(z)

Stacking these layers allows neural networks to represent complex nonlinear functions.

Intuition

If we remove the hidden layer, the model becomes logistic regression:

h(x) = g(\theta^T x)

With hidden layers, the network instead uses learned features:

a_1, a_2, a_3

These are:

Computed by the hidden layer
Learned from data
Controlled by parameters $\Theta^{(1)}$

So a neural network is:

Logistic regression on learned features.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Forward Propagation in Neural Networks

Examples and Intuitions I — Neural Networks as Logical Gates

AI-DeepLearning/2-Vectorized-Neural-Networks

Loading ⏳

Fetching content, this won’t take long…

💡 Did you know?

🐙 Octopuses have three hearts and blue blood.

AI-DeepLearning

AI-AgenticAI

AI-DeepLearning

AI-GenAI

AI-Infrastructure

AI-Machine-Learning

AI-Math

AWS

Azure

Hobbies

kubernetes

Management

Programming

Terraform

Z_Appendix

0-root

AI-DeepLearning

Vectorized Neural Networks Model Representation

Learn how to represent neural networks in a vectorized form, transforming scalar equations into efficient matrix operations for scalable and optimized computations.

Data Science

Machine Learning

Deep Learning

Neural Networks

Artificial Intelligence

Computational Graphs

← Previous

Forward Propagation in Neural Networks

Examples and Intuitions I — Neural Networks as Logical Gates

Forward Propagation

For any layer $j$ :

Linear Step

Calculate pre activation term

z^{(j)} = \Theta^{(j-1)} a^{(j-1)}

Activation Step

Apply activation

a^{(j)} = g(z^{(j)})

This process is repeated until the output layer.

From Scalar Equations to Vector Form


graph LR

%% Input Layer
    subgraph Input Layer
        x1(((x1)))
        x2(((x2)))
        x3(((x3)))
    end

%% Hidden Layer 1
    subgraph Hidden Layer 1
        a1{a1}
        a2{a2}
        a3{a3}
    end



%% Output Layer
    subgraph Output Layer
        y(((hθx)))
    end

%% Connections: Input → Hidden 1
    x1 --> a1
    x1 --> a2
    x1 --> a3
    
    x2 --> a1
    x2 --> a2
    x2 --> a3
    
    x3 --> a1
    x3 --> a2
    x3 --> a3

%% Connections: Hidden 2 → Output
    a1 --> y
    a2 --> y
    a3 --> y

Previously, we wrote each neuron separately.

For the hidden layer:

a^{(2)}_1 = g\left(\Theta^{(1)}_{10}x_0 + \Theta^{(1)}_{11}x_1 + \Theta^{(1)}_{12}x_2 + \Theta^{(1)}_{13}x_3\right)

a^{(2)}_2 = g\left(\Theta^{(1)}_{20}x_0 + \Theta^{(1)}_{21}x_1 + \Theta^{(1)}_{22}x_2 + \Theta^{(1)}_{23}x_3\right)

a^{(2)}_3 = g\left(\Theta^{(1)}_{30}x_0 + \Theta^{(1)}_{31}x_1 + \Theta^{(1)}_{32}x_2 + \Theta^{(1)}_{33}x_3\right)

Where

Superscript $a^{(2)}$ indicates layer 2 (hidden layer)
$g(\cdot)$ is the sigmoid function

Final hypothesis is:

h_\Theta(x) = a^{(3)}_1

Where

a^{(3)}_1 = g\left(\Theta^{(2)}_{10}a^{(2)}_0 + \Theta^{(2)}_{11}a^{(2)}_1 + \Theta^{(2)}_{12}a^{(2)}_2 + \Theta^{(2)}_{13}a^{(2)}_3\right)

It does not scale. So we need to vectorize it for more complex use cases.

Pre Activation Term $z_k^{(j)}$

Intermediate Variable contain the weighted sum before activation:

Suppose

z_1^{(2)} = \Theta^{(1)}_{10}x_0 + \Theta^{(1)}_{11}x_1 + \Theta^{(1)}_{12}x_2 + \Theta^{(1)}_{13}x_3

z_2^{(2)} = \Theta^{(1)}_{20}x_0 + \Theta^{(1)}_{21}x_1 + \Theta^{(1)}_{22}x_2 + \Theta^{(1)}_{23}x_3

z_3^{(2)} = \Theta^{(1)}_{30}x_0 + \Theta^{(1)}_{31}x_1 + \Theta^{(1)}_{32}x_2 + \Theta^{(1)}_{33}x_3

🧠 Generalized preactivation term

z^{(j)}_k = \Theta^{(j-1)}_{k,0} a^{(j-1)}_0 + \Theta^{(j-1)}_{k,1} a^{(j-1)}_1 + \dots + \Theta^{(j-1)}_{k,n} a^{(j-1)}_n

Then

a^{(2)}_1 = g(z_1^{(2)})

a^{(2)}_2 = g(z_2^{(2)})

a^{(2)}_3 = g(z_3^{(2)})

🧠 Generalized Activation

a^{(j)}_k = g\left(z^{(j)}_k\right)

This separates:

Linear computation
Nonlinear activation

Vector Representation

Input layer:

x = \begin{bmatrix} x_0 \\ x_1 \\ \vdots \\ x_n \end{bmatrix}

Where $x_0$ = 1

Let

a^{(1)} = x

Weighted sum vector:

z^{(j)} = \begin{bmatrix} z^{(j)}_1 \\ z^{(j)}_2 \\ \vdots \\ z^{(j)}_{s_j} \end{bmatrix}

Where:

s_j = \text{number of units in layer } j

We can calculate $z$ as

z^{(2)} = \Theta^{(1)}x

Since x = $a^{(1)}$ , so we can rewrite it:

z^{(2)} = \Theta^{(1)} a^{(1)}

🧠 Generalized Vectorized Preactivation term:

z^{(j)} = \Theta^{(j-1)} a^{(j-1)}

Where Vector Dimensions:

\Theta^{(j-1)} \in \mathbb{R}^{s_j \times (n+1)}

a^{(j-1)} \in \mathbb{R}^{(n+1) \times 1}

z^{(j)} \in \mathbb{R}^{s_j \times 1}

Activation Function

Since

a^{(2)}_2 = g(z_2^{(2)})

Generalized Activation Function:

a^{(j)} = g\left(z^{(j)}\right)

If using sigmoid:

g(z) = \frac{1}{1 + e^{-z}}

Add Bias Unit

After computing $a^{(2)}$ , add:

a_0^{(2)} = 1

Now:

a^{(j)} = \begin{bmatrix} 1 \\ a^{(j)}_1 \\ \vdots \\ a^{(j)}_{s_j} \end{bmatrix}

Output Layer

Repeat the same process:

calculate Linear Term z

z^{(j+1)} = \Theta^{(j)} a^{(j)}

Apply activation sigmoid of z

a^{(j+1)} = g\left(z^{(j+1)}\right)

Final hypothesis:

h_\Theta(x) = a^{(3)} = g(z^{(3)})

🧠 Generalized Hypothesis

h_\Theta(x) = a^{(j+1)} = g\left(z^{(j+1)}\right)

The Big Picture

Each layer performs:

\text{Linear transformation} \quad z = \Theta a

followed by

\text{Nonlinearity} \quad a = g(z)

Stacking these layers allows neural networks to represent complex nonlinear functions.

Intuition

If we remove the hidden layer, the model becomes logistic regression:

h(x) = g(\theta^T x)

With hidden layers, the network instead uses learned features:

a_1, a_2, a_3

These are:

Computed by the hidden layer
Learned from data
Controlled by parameters $\Theta^{(1)}$

So a neural network is:

Logistic regression on learned features.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Forward Propagation in Neural Networks

Examples and Intuitions I — Neural Networks as Logical Gates

AI-DeepLearning/2-Vectorized-Neural-Networks

Fetching content, this won’t take long…

🐙 Octopuses have three hearts and blue blood.

Fetching content, this won’t take long…

🍌 Bananas are berries, but strawberries are not.

AI-DeepLearning

AI-AgenticAI

AI-DeepLearning

AI-GenAI

AI-Infrastructure

AI-Machine-Learning

AI-Math

AWS

Azure

Hobbies

kubernetes

Management

Programming

Terraform

Z_Appendix

0-root

Vectorized Neural Networks Model Representation

Learn how to represent neural networks in a vectorized form, transforming scalar equations into efficient matrix operations for scalable and optimized computations.

Forward Propagation

Linear Step

Activation Step

From Scalar Equations to Vector Form

Pre Activation Term zk(j)z_k^{(j)}zk(j)​

🧠 Generalized preactivation term

Vector Representation

Activation Function

Add Bias Unit

Output Layer

The Big Picture

Intuition

Written by Hitesh Sahu, a passionate developer and blogger.

Fetching content, this won’t take long…

🐙 Octopuses have three hearts and blue blood.

AI-DeepLearning

AI-AgenticAI

AI-DeepLearning

AI-GenAI

AI-Infrastructure

AI-Machine-Learning

AI-Math

AWS

Azure

Hobbies

kubernetes

Management

Programming

Terraform

Z_Appendix

0-root

Vectorized Neural Networks Model Representation

Learn how to represent neural networks in a vectorized form, transforming scalar equations into efficient matrix operations for scalable and optimized computations.

Forward Propagation

Linear Step

Activation Step

From Scalar Equations to Vector Form

Pre Activation Term zk(j)z_k^{(j)}zk(j)​

🧠 Generalized preactivation term

Vector Representation

Activation Function

Add Bias Unit

Output Layer

The Big Picture

Intuition

Written by Hitesh Sahu, a passionate developer and blogger.

Pre Activation Term $z_k^{(j)}$

Pre Activation Term $z_k^{(j)}$