Hitesh Sahu
Hitesh SahuHitesh Sahu
  1. Home
  2. ›
  3. posts
  4. ›
  5. …

  6. ›
  7. 2 Vectorized Neural Networks

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🐙 Octopuses have three hearts and blue blood.

🍪 This website uses cookies

No personal data is stored on our servers however third party tools Google Analytics cookies to measure traffic and improve your website experience. Learn more

Cover Image for Vectorized Neural Networks Model Representation

Vectorized Neural Networks Model Representation

Learn how to represent neural networks in a vectorized form, transforming scalar equations into efficient matrix operations for scalable and optimized computations.

Hitesh Sahu
Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Forward Propagation in Neural Networks

Next →

Examples and Intuitions I — Neural Networks as Logical Gates

Forward Propagation

For any layer jjj:

Linear Step

Calculate pre activation term

z(j)=Θ(j−1)a(j−1)z^{(j)} = \Theta^{(j-1)} a^{(j-1)}z(j)=Θ(j−1)a(j−1)

Activation Step

Apply activation

a(j)=g(z(j))a^{(j)} = g(z^{(j)})a(j)=g(z(j))

This process is repeated until the output layer.


From Scalar Equations to Vector Form


graph LR

%% Input Layer
    subgraph Input Layer
        x1(((x1)))
        x2(((x2)))
        x3(((x3)))
    end

%% Hidden Layer 1
    subgraph Hidden Layer 1
        a1{a1}
        a2{a2}
        a3{a3}
    end



%% Output Layer
    subgraph Output Layer
        y(((hθx)))
    end

%% Connections: Input → Hidden 1
    x1 --> a1
    x1 --> a2
    x1 --> a3
    
    x2 --> a1
    x2 --> a2
    x2 --> a3
    
    x3 --> a1
    x3 --> a2
    x3 --> a3

%% Connections: Hidden 2 → Output
    a1 --> y
    a2 --> y
    a3 --> y

Previously, we wrote each neuron separately.

For the hidden layer:

a1(2)=g(Θ10(1)x0+Θ11(1)x1+Θ12(1)x2+Θ13(1)x3)a^{(2)}_1 = g\left(\Theta^{(1)}_{10}x_0 + \Theta^{(1)}_{11}x_1 + \Theta^{(1)}_{12}x_2 + \Theta^{(1)}_{13}x_3\right)a1(2)​=g(Θ10(1)​x0​+Θ11(1)​x1​+Θ12(1)​x2​+Θ13(1)​x3​) a2(2)=g(Θ20(1)x0+Θ21(1)x1+Θ22(1)x2+Θ23(1)x3)a^{(2)}_2 = g\left(\Theta^{(1)}_{20}x_0 + \Theta^{(1)}_{21}x_1 + \Theta^{(1)}_{22}x_2 + \Theta^{(1)}_{23}x_3\right)a2(2)​=g(Θ20(1)​x0​+Θ21(1)​x1​+Θ22(1)​x2​+Θ23(1)​x3​) a3(2)=g(Θ30(1)x0+Θ31(1)x1+Θ32(1)x2+Θ33(1)x3)a^{(2)}_3 = g\left(\Theta^{(1)}_{30}x_0 + \Theta^{(1)}_{31}x_1 + \Theta^{(1)}_{32}x_2 + \Theta^{(1)}_{33}x_3\right)a3(2)​=g(Θ30(1)​x0​+Θ31(1)​x1​+Θ32(1)​x2​+Θ33(1)​x3​)

Where

  • Superscript a(2)a^{(2)}a(2) indicates layer 2 (hidden layer)
  • g(⋅)g(\cdot)g(⋅) is the sigmoid function

Final hypothesis is:

hΘ(x)=a1(3)h_\Theta(x) = a^{(3)}_1 hΘ​(x)=a1(3)​

Where

a1(3)=g(Θ10(2)a0(2)+Θ11(2)a1(2)+Θ12(2)a2(2)+Θ13(2)a3(2))a^{(3)}_1 = g\left(\Theta^{(2)}_{10}a^{(2)}_0 + \Theta^{(2)}_{11}a^{(2)}_1 + \Theta^{(2)}_{12}a^{(2)}_2 + \Theta^{(2)}_{13}a^{(2)}_3\right)a1(3)​=g(Θ10(2)​a0(2)​+Θ11(2)​a1(2)​+Θ12(2)​a2(2)​+Θ13(2)​a3(2)​)

It does not scale. So we need to vectorize it for more complex use cases.


Pre Activation Term zk(j)z_k^{(j)}zk(j)​

Intermediate Variable contain the weighted sum before activation:

Suppose

z1(2)=Θ10(1)x0+Θ11(1)x1+Θ12(1)x2+Θ13(1)x3z_1^{(2)} = \Theta^{(1)}_{10}x_0 + \Theta^{(1)}_{11}x_1 + \Theta^{(1)}_{12}x_2 + \Theta^{(1)}_{13}x_3z1(2)​=Θ10(1)​x0​+Θ11(1)​x1​+Θ12(1)​x2​+Θ13(1)​x3​ z2(2)=g(Θ20(1)x0+Θ21(1)x1+Θ22(1)x2+Θ23(1)x3)z_2^{(2)} = g\left(\Theta^{(1)}_{20}x_0 + \Theta^{(1)}_{21}x_1 + \Theta^{(1)}_{22}x_2 + \Theta^{(1)}_{23}x_3\right)z2(2)​=g(Θ20(1)​x0​+Θ21(1)​x1​+Θ22(1)​x2​+Θ23(1)​x3​) z3(2)=g(Θ20(1)x0+Θ21(1)x1+Θ22(1)x2+Θ23(1)x3)z_3^{(2)} = g\left(\Theta^{(1)}_{20}x_0 + \Theta^{(1)}_{21}x_1 + \Theta^{(1)}_{22}x_2 + \Theta^{(1)}_{23}x_3\right)z3(2)​=g(Θ20(1)​x0​+Θ21(1)​x1​+Θ22(1)​x2​+Θ23(1)​x3​)

🧠 Generalized preactivation term

zk(j)=Θk,0(j−1)a0(j−1)+Θk,1(j−1)a1(j−1)+⋯+Θk,n(j−1)an(j−1)z^{(j)}_k = \Theta^{(j-1)}_{k,0} a^{(j-1)}_0 + \Theta^{(j-1)}_{k,1} a^{(j-1)}_1 + \dots + \Theta^{(j-1)}_{k,n} a^{(j-1)}_nzk(j)​=Θk,0(j−1)​a0(j−1)​+Θk,1(j−1)​a1(j−1)​+⋯+Θk,n(j−1)​an(j−1)​

Then

a1(2)=g(z1(2))a^{(2)}_1 = g(z_1^{(2)})a1(2)​=g(z1(2)​) a2(2)=g(z2(2))a^{(2)}_2 = g(z_2^{(2)})a2(2)​=g(z2(2)​) a3(2)=g(z3(2))a^{(2)}_3 = g(z_3^{(2)})a3(2)​=g(z3(2)​)

🧠 Generalized Activation

ak(j)=g(zk(j))a^{(j)}_k = g\left(z^{(j)}_k\right)ak(j)​=g(zk(j)​)

This separates:

  • Linear computation
  • Nonlinear activation

Vector Representation

Input layer:

x=[x0x1⋮xn]x = \begin{bmatrix} x_0 \\ x_1 \\ \vdots \\ x_n \end{bmatrix}x=​x0​x1​⋮xn​​​

Where x0x_0x0​ = 1

Let

a(1)=xa^{(1)} = xa(1)=x

Weighted sum vector:

z(j)=[z1(j)z2(j)⋮zsj(j)]z^{(j)} = \begin{bmatrix} z^{(j)}_1 \\ z^{(j)}_2 \\ \vdots \\ z^{(j)}_{s_j} \end{bmatrix}z(j)=​z1(j)​z2(j)​⋮zsj​(j)​​​

Where:

sj=number of units in layer js_j = \text{number of units in layer } jsj​=number of units in layer j

We can calculate zzz as

z(2)=Θ(1)xz^{(2)} = \Theta^{(1)}xz(2)=Θ(1)x

Since x = a(1)a^{(1)}a(1), so we can rewrite it:

z(2)=Θ(1)a(1)z^{(2)} = \Theta^{(1)} a^{(1)}z(2)=Θ(1)a(1)

🧠 Generalized Vectorized Preactivation term:

z(j)=Θ(j−1)a(j−1)z^{(j)} = \Theta^{(j-1)} a^{(j-1)}z(j)=Θ(j−1)a(j−1)

Where Vector Dimensions:

Θ(j−1)∈Rsj×(n+1)\Theta^{(j-1)} \in \mathbb{R}^{s_j \times (n+1)}Θ(j−1)∈Rsj​×(n+1) a(j−1)∈R(n+1)×1a^{(j-1)} \in \mathbb{R}^{(n+1) \times 1}a(j−1)∈R(n+1)×1 z(j)∈Rsj×1z^{(j)} \in \mathbb{R}^{s_j \times 1}z(j)∈Rsj​×1

Activation Function

Since

a2(2)=g(z2(2))a^{(2)}_2 = g(z_2^{(2)})a2(2)​=g(z2(2)​)

Generalized Activation Function:

a(j)=g(z(j))a^{(j)} = g\left(z^{(j)}\right)a(j)=g(z(j))

If using sigmoid:

g(z)=11+e−zg(z) = \frac{1}{1 + e^{-z}}g(z)=1+e−z1​

Add Bias Unit

After computing a(2)a^{(2)}a(2), add:

a0(2)=1a_0^{(2)} = 1a0(2)​=1

Now:

a(j)=[1a1(j)⋮asj(j)]a^{(j)} = \begin{bmatrix} 1 \\ a^{(j)}_1 \\ \vdots \\ a^{(j)}_{s_j} \end{bmatrix}a(j)=​1a1(j)​⋮asj​(j)​​​

Output Layer

Repeat the same process:

calculate Linear Term z

z(j+1)=Θ(j)a(j)z^{(j+1)} = \Theta^{(j)} a^{(j)}z(j+1)=Θ(j)a(j)

Apply activation sigmoid of z

a(j+1)=g(z(j+1))a^{(j+1)} = g\left(z^{(j+1)}\right)a(j+1)=g(z(j+1))

Final hypothesis:

hΘ(x)=a(3)=g(z(3))h_\Theta(x) = a^{(3)} = g(z^{(3)})hΘ​(x)=a(3)=g(z(3))

🧠 Generalized Hypothesis

hΘ(x)=a(j+1)=g(z(j+1))h_\Theta(x) = a^{(j+1)} = g\left(z^{(j+1)}\right)hΘ​(x)=a(j+1)=g(z(j+1))

The Big Picture

Each layer performs:

Linear transformationz=Θa\text{Linear transformation} \quad z = \Theta aLinear transformationz=Θa

followed by

Nonlinearitya=g(z)\text{Nonlinearity} \quad a = g(z)Nonlinearitya=g(z)

Stacking these layers allows neural networks to represent complex nonlinear functions.

Intuition

If we remove the hidden layer, the model becomes logistic regression:

h(x)=g(θTx)h(x) = g(\theta^T x)h(x)=g(θTx)

With hidden layers, the network instead uses learned features:

a1,a2,a3a_1, a_2, a_3a1​,a2​,a3​

These are:

  • Computed by the hidden layer
  • Learned from data
  • Controlled by parameters Θ(1)\Theta^{(1)}Θ(1)

So a neural network is:

Logistic regression on learned features.

AI-DeepLearning/2-Vectorized-Neural-Networks
Let's work together
+49 176-2019-2523
hiteshkrsahu@gmail.com
WhatsApp
Skype
Munich 🥨, Germany 🇩🇪, EU
Playstore
Hitesh Sahu's apps on Google Play Store
Need Help?
Let's Connect
Navigation
  Home/About
  Skills
  Work/Projects
  Lab/Experiments
  Contribution
  Awards
  Art/Sketches
  Thoughts
  Contact
Links
  Sitemap
  Legal Notice
  Privacy Policy

Made with

NextJS logo

NextJS by

hitesh Sahu

| © 2026 All rights reserved.