Hitesh Sahu
Hitesh SahuHitesh Sahu
  1. Home
  2. ›
  3. posts
  4. ›
  5. …

  6. ›
  7. 2 Vectorized Neural Networks

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🍯 Honey never spoils — archaeologists found 3,000-year-old jars still edible.

🍪 This website uses cookies

No personal data is stored on our servers however third party tools Google Analytics cookies to measure traffic and improve your website experience. Learn more

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🤯 Your stomach gets a new lining every 3–4 days.
Cover Image for Vectorized Neural Networks Model Representation

Vectorized Neural Networks Model Representation

Learn how to represent neural networks in a vectorized form, transforming scalar equations into efficient matrix operations for scalable and optimized computations.

Hitesh Sahu
Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Neural Network Hypothesis and Intuition

Next →

Examples and Intuitions I — Neural Networks as Logical Gates

Forward Propagation

For any layer jjj:

Linear Step

Calculate pre activation term

z(j)=Θ(j−1)a(j−1)z^{(j)} = \Theta^{(j-1)} a^{(j-1)}z(j)=Θ(j−1)a(j−1)

Activation Step

Apply activation

a(j)=g(z(j))a^{(j)} = g(z^{(j)})a(j)=g(z(j))

This process is repeated until the output layer.


From Scalar Equations to Vector Form


graph LR

%% Input Layer
    subgraph Input Layer
        x1(((x1)))
        x2(((x2)))
        x3(((x3)))
    end

%% Hidden Layer 1
    subgraph Hidden Layer 1
        a1{a1}
        a2{a2}
        a3{a3}
    end



%% Output Layer
    subgraph Output Layer
        y(((hθx)))
    end

%% Connections: Input → Hidden 1
    x1 --> a1
    x1 --> a2
    x1 --> a3
    
    x2 --> a1
    x2 --> a2
    x2 --> a3
    
    x3 --> a1
    x3 --> a2
    x3 --> a3

%% Connections: Hidden 2 → Output
    a1 --> y
    a2 --> y
    a3 --> y

Previously, we wrote each neuron separately.

For the hidden layer:

a1(2)=g(Θ10(1)x0+Θ11(1)x1+Θ12(1)x2+Θ13(1)x3)a^{(2)}_1 = g\left(\Theta^{(1)}_{10}x_0 + \Theta^{(1)}_{11}x_1 + \Theta^{(1)}_{12}x_2 + \Theta^{(1)}_{13}x_3\right)a1(2)​=g(Θ10(1)​x0​+Θ11(1)​x1​+Θ12(1)​x2​+Θ13(1)​x3​) a2(2)=g(Θ20(1)x0+Θ21(1)x1+Θ22(1)x2+Θ23(1)x3)a^{(2)}_2 = g\left(\Theta^{(1)}_{20}x_0 + \Theta^{(1)}_{21}x_1 + \Theta^{(1)}_{22}x_2 + \Theta^{(1)}_{23}x_3\right)a2(2)​=g(Θ20(1)​x0​+Θ21(1)​x1​+Θ22(1)​x2​+Θ23(1)​x3​) a3(2)=g(Θ30(1)x0+Θ31(1)x1+Θ32(1)x2+Θ33(1)x3)a^{(2)}_3 = g\left(\Theta^{(1)}_{30}x_0 + \Theta^{(1)}_{31}x_1 + \Theta^{(1)}_{32}x_2 + \Theta^{(1)}_{33}x_3\right)a3(2)​=g(Θ30(1)​x0​+Θ31(1)​x1​+Θ32(1)​x2​+Θ33(1)​x3​)

Where

  • Superscript a(2)a^{(2)}a(2) indicates layer 2 (hidden layer)
  • g(⋅)g(\cdot)g(⋅) is the sigmoid function

Final hypothesis is:

hΘ(x)=a1(3)h_\Theta(x) = a^{(3)}_1 hΘ​(x)=a1(3)​

Where

a1(3)=g(Θ10(2)a0(2)+Θ11(2)a1(2)+Θ12(2)a2(2)+Θ13(2)a3(2))a^{(3)}_1 = g\left(\Theta^{(2)}_{10}a^{(2)}_0 + \Theta^{(2)}_{11}a^{(2)}_1 + \Theta^{(2)}_{12}a^{(2)}_2 + \Theta^{(2)}_{13}a^{(2)}_3\right)a1(3)​=g(Θ10(2)​a0(2)​+Θ11(2)​a1(2)​+Θ12(2)​a2(2)​+Θ13(2)​a3(2)​)

It does not scale. So we need to vectorize it for more complex use cases.


Pre Activation Term zk(j)z_k^{(j)}zk(j)​

Intermediate Variable contain the weighted sum before activation:

Suppose

z1(2)=Θ10(1)x0+Θ11(1)x1+Θ12(1)x2+Θ13(1)x3z_1^{(2)} = \Theta^{(1)}_{10}x_0 + \Theta^{(1)}_{11}x_1 + \Theta^{(1)}_{12}x_2 + \Theta^{(1)}_{13}x_3z1(2)​=Θ10(1)​x0​+Θ11(1)​x1​+Θ12(1)​x2​+Θ13(1)​x3​ z2(2)=g(Θ20(1)x0+Θ21(1)x1+Θ22(1)x2+Θ23(1)x3)z_2^{(2)} = g\left(\Theta^{(1)}_{20}x_0 + \Theta^{(1)}_{21}x_1 + \Theta^{(1)}_{22}x_2 + \Theta^{(1)}_{23}x_3\right)z2(2)​=g(Θ20(1)​x0​+Θ21(1)​x1​+Θ22(1)​x2​+Θ23(1)​x3​) z3(2)=g(Θ20(1)x0+Θ21(1)x1+Θ22(1)x2+Θ23(1)x3)z_3^{(2)} = g\left(\Theta^{(1)}_{20}x_0 + \Theta^{(1)}_{21}x_1 + \Theta^{(1)}_{22}x_2 + \Theta^{(1)}_{23}x_3\right)z3(2)​=g(Θ20(1)​x0​+Θ21(1)​x1​+Θ22(1)​x2​+Θ23(1)​x3​)

🧠 Generalized preactivation term

zk(j)=Θk,0(j−1)a0(j−1)+Θk,1(j−1)a1(j−1)+⋯+Θk,n(j−1)an(j−1)z^{(j)}_k = \Theta^{(j-1)}_{k,0} a^{(j-1)}_0 + \Theta^{(j-1)}_{k,1} a^{(j-1)}_1 + \dots + \Theta^{(j-1)}_{k,n} a^{(j-1)}_nzk(j)​=Θk,0(j−1)​a0(j−1)​+Θk,1(j−1)​a1(j−1)​+⋯+Θk,n(j−1)​an(j−1)​

Then

a1(2)=g(z1(2))a^{(2)}_1 = g(z_1^{(2)})a1(2)​=g(z1(2)​) a2(2)=g(z2(2))a^{(2)}_2 = g(z_2^{(2)})a2(2)​=g(z2(2)​) a3(2)=g(z3(2))a^{(2)}_3 = g(z_3^{(2)})a3(2)​=g(z3(2)​)

🧠 Generalized Activation

ak(j)=g(zk(j))a^{(j)}_k = g\left(z^{(j)}_k\right)ak(j)​=g(zk(j)​)

This separates:

  • Linear computation
  • Nonlinear activation

Vector Representation

Input layer:

x=[x0x1⋮xn]x = \begin{bmatrix} x_0 \\ x_1 \\ \vdots \\ x_n \end{bmatrix}x=​x0​x1​⋮xn​​​

Where x0x_0x0​ = 1

Let

a(1)=xa^{(1)} = xa(1)=x

Weighted sum vector:

z(j)=[z1(j)z2(j)⋮zsj(j)]z^{(j)} = \begin{bmatrix} z^{(j)}_1 \\ z^{(j)}_2 \\ \vdots \\ z^{(j)}_{s_j} \end{bmatrix}z(j)=​z1(j)​z2(j)​⋮zsj​(j)​​​

Where:

sj=number of units in layer js_j = \text{number of units in layer } jsj​=number of units in layer j

We can calculate zzz as

z(2)=Θ(1)xz^{(2)} = \Theta^{(1)}xz(2)=Θ(1)x

Since x = a(1)a^{(1)}a(1), so we can rewrite it:

z(2)=Θ(1)a(1)z^{(2)} = \Theta^{(1)} a^{(1)}z(2)=Θ(1)a(1)

🧠 Generalized Vectorized Preactivation term:

z(j)=Θ(j−1)a(j−1)z^{(j)} = \Theta^{(j-1)} a^{(j-1)}z(j)=Θ(j−1)a(j−1)

Where Vector Dimensions:

Θ(j−1)∈Rsj×(n+1)\Theta^{(j-1)} \in \mathbb{R}^{s_j \times (n+1)}Θ(j−1)∈Rsj​×(n+1) a(j−1)∈R(n+1)×1a^{(j-1)} \in \mathbb{R}^{(n+1) \times 1}a(j−1)∈R(n+1)×1 z(j)∈Rsj×1z^{(j)} \in \mathbb{R}^{s_j \times 1}z(j)∈Rsj​×1

Activation Function

Since

a2(2)=g(z2(2))a^{(2)}_2 = g(z_2^{(2)})a2(2)​=g(z2(2)​)

Generalized Activation Function:

a(j)=g(z(j))a^{(j)} = g\left(z^{(j)}\right)a(j)=g(z(j))

If using sigmoid:

g(z)=11+e−zg(z) = \frac{1}{1 + e^{-z}}g(z)=1+e−z1​

Add Bias Unit

After computing a(2)a^{(2)}a(2), add:

a0(2)=1a_0^{(2)} = 1a0(2)​=1

Now:

a(j)=[1a1(j)⋮asj(j)]a^{(j)} = \begin{bmatrix} 1 \\ a^{(j)}_1 \\ \vdots \\ a^{(j)}_{s_j} \end{bmatrix}a(j)=​1a1(j)​⋮asj​(j)​​​

Output Layer

Repeat the same process:

calculate Linear Term z

z(j+1)=Θ(j)a(j)z^{(j+1)} = \Theta^{(j)} a^{(j)}z(j+1)=Θ(j)a(j)

Apply activation sigmoid of z

a(j+1)=g(z(j+1))a^{(j+1)} = g\left(z^{(j+1)}\right)a(j+1)=g(z(j+1))

Final hypothesis:

hΘ(x)=a(3)=g(z(3))h_\Theta(x) = a^{(3)} = g(z^{(3)})hΘ​(x)=a(3)=g(z(3))

🧠 Generalized Hypothesis

hΘ(x)=a(j+1)=g(z(j+1))h_\Theta(x) = a^{(j+1)} = g\left(z^{(j+1)}\right)hΘ​(x)=a(j+1)=g(z(j+1))

The Big Picture

Each layer performs:

Linear transformationz=Θa\text{Linear transformation} \quad z = \Theta aLinear transformationz=Θa

followed by

Nonlinearitya=g(z)\text{Nonlinearity} \quad a = g(z)Nonlinearitya=g(z)

Stacking these layers allows neural networks to represent complex nonlinear functions.

Intuition

If we remove the hidden layer, the model becomes logistic regression:

h(x)=g(θTx)h(x) = g(\theta^T x)h(x)=g(θTx)

With hidden layers, the network instead uses learned features:

a1,a2,a3a_1, a_2, a_3a1​,a2​,a3​

These are:

  • Computed by the hidden layer
  • Learned from data
  • Controlled by parameters Θ(1)\Theta^{(1)}Θ(1)

So a neural network is:

Logistic regression on learned features.

AI-DeepLearning/2-Vectorized-Neural-Networks
Let's work together
+49 176-2019-2523
hiteshkrsahu@gmail.com
WhatsApp
Skype
Munich 🥨, Germany 🇩🇪, EU
Playstore
Hitesh Sahu's apps on Google Play Store
Need Help?
Let's Connect
Navigation
  Home/About
  Skills
  Work/Projects
  Lab/Experiments
  Contribution
  Awards
  Art/Sketches
  Thoughts
  Contact
Links
  Sitemap
  Legal Notice
  Privacy Policy

Made with

NextJS logo

NextJS by

hitesh Sahu

| © 2026 All rights reserved.