Hitesh Sahu
Hitesh SahuHitesh Sahu
  1. Home
  2. ›
  3. posts
  4. ›
  5. …

  6. ›
  7. MultiVariant Linear Algebra

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🦈 Sharks existed before trees 🌳.

🍪 This website uses cookies

No personal data is stored on our servers however third party tools Google Analytics cookies to measure traffic and improve your website experience. Learn more

Cover Image for ️☠️ Advance MultiVariant Linear Algebra

️☠️ Advance MultiVariant Linear Algebra

Detailed explanation of the Normal Equation for linear regression, including matrix formulation, closed-form solution, comparison with gradient descent, and practical considerations for implementation.

Hitesh Sahu
Written by Hitesh Sahu, a passionate developer and blogger.

Thu Feb 19 2026

Share This on

☠️ Advance MultiVariant Linear Algebra

💀 When Geometry Becomes Dangerous

Linear algebra is the language of space manipulation.

Machine learning is controlled geometric transformation.

Why Linear Algebra Matters in ML

Machine learning deals with:

  • Multiple features
  • Large datasets
  • Efficient computation
  • Vectorized operations

Instead of writing nested loops, we use matrix operations to compute predictions and updates efficiently.

This is why multivariate linear algebra is essential.

Think of ML as:

  • Data → points in high-dimensional space

  • Features → axes

  • Models → transformations

  • Training → adjusting geometry

  • Loss → distance between vectors

  • Optimization → walking downhill in space

  • Gradient descent becomes directional movement

  • Regularization becomes shrinking vector norms

  • Overfitting becomes high-dimensional distortion

  • RAG embeddings become spatial similarity

  • Attention becomes weighted projection


Mathematical object

A mathematical object is an abstract concept which can be a value that can be assigned to a symbol, and therefore can be involved in formulas.

  • Examples numbers, expressions, shapes, functions, and sets.
  • Complex Objects: theorems, proofs.

Vectors(x⃗\vec{x}x)

A vector is an ordered list of numbers.

  • Latin: vector , meaning "carrier" or "driver"

It is:

  • A direction
  • A magnitude
  • A point in high-dimensional space

In ML, vectors represent:

  • A data point → a vector
  • A feature column → a direction
  • A model weight vector → a direction of best fit

Column Vector (most common in ML)

A point in n dimensions.

x=[x1x2⋮xn]x = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}x=​x1​x2​⋮xn​​​

Dimension: n×1n \times 1n×1


Matrices

A matrix is a 2D array of numbers or table of numbers.

Dimension:

m×nm \times nm×n

Where:

  • mmm = rows
  • nnn = columns

Element notation:

AijA_{ij}Aij​

means element in row iii, column jjj.

Tensor

Algebraic object that describes a multilinear relationship between sets of algebraic objects associated with a vector space.

  • Latin: tendere meaning 'to stretch'

A matrix is a transformation of space.

All machine learning models are compositions of transformations.

If:

y=Axy = Axy=Ax

Then AAA transforms vector xxx into a new vector yyy.

Geometrically, a matrix can:

  • Stretch
  • Compress
  • Rotate
  • Reflect
  • Shear
  • Project

Linear Regression = Projection

When solving linear regression:

θ^=(XTX)−1XTy\hat{\theta} = (X^T X)^{-1} X^T yθ^=(XTX)−1XTy

You are not just solving equations.

You are projecting vector ( y ) onto the column space of ( X ).

Meaning:

  • ( X ) defines a subspace (all linear combinations of features)
  • ( y ) may not lie in that space
  • We find the closest point in that space

This closest point is the orthogonal projection.

The residual error is perpendicular to the feature space.

Mathematically:

XT(y−Xθ^)=0X^T (y - X\hat{\theta}) = 0XT(y−Xθ^)=0

Geometric meaning:

The error vector is orthogonal to every feature direction.

Orthogonality = Independence

Orthogonality representing perpendicularity, where elements (vectors, functions, or data) are independent and have a dot product or inner product of zero.

Two vectors are orthogonal if:

vTw=0v^T w = 0vTw=0

Or their dot product is zero. ie they form a 90 angle or are perpendicular.

v.w=0v.w =0v.w=0

In ML, orthogonality means:

  • No linear dependence
  • No shared directional information

This is why:

  • PCA finds orthogonal principal components
  • QR decomposition builds orthogonal bases
  • SVD decomposes into orthogonal directions

Orthogonality reduces redundancy.

Eigenvectors

Non-zero vectors associated with a linear transformation (represented by a square matrix) that do not change their direction when that transformation is applied.

Instead, they are only

  • scaled—stretched
  • compressed
  • reversed

by a scalar factor known as the eigenvalue λ\lambdaλ

If:

Av = $\lambda$ v

That means:

  • Applying transformation AAA
  • Does not change the direction of vvv
  • Only scales it by λ\lambdaλ

Eigenvectors are:

  • Directions that remain stable under transformation.
  • Natural Directions of a Transformation

In ML:

  • PCA uses eigenvectors of the covariance matrix
  • These represent directions of maximum variance
  • Each eigenvector = principal axis of data

Eigenvalues tell you how important that direction is.

Singular Value Decomposition (SVD)

Factorization of a real or complex matrix into a rotation, followed by a rescaling followed by another rotation.

SVD states:

X=UΣVTX = U \Sigma V^TX=UΣVT

Geometrically:

  1. VTV^TVT rotates the space
  2. SigmaSigmaSigma scales each axis
  3. UUU rotates again

So any matrix transformation can be seen as:

Rotate → Stretch → Rotate

This is why SVD is foundational in:

  • Dimensionality reduction
  • Embeddings
  • LLM weight compression
  • Recommender systems

Neural Networks as Layered Transformations

Each layer computes:

z=Wx+bz = Wx + bz=Wx+b

Which is:

  1. Linear transformation ( W )
  2. Shift ( b )
  3. Nonlinear activation

Geometrically:

  • Linear layers reshape space
  • Activations bend space
  • Deep networks progressively warp geometry

Training adjusts ( W ) so that:

  • Classes become linearly separable
  • Desired outputs align with target directions

Deep learning is geometry engineering.


4. Data Matrix in Machine Learning

If we have:

  • mmm training examples
  • nnn features

The data matrix is:

X=[−−−x(1)−−−−−−x(2)−−−⋮−−−x(m)−−−]X = \begin{bmatrix} --- x^{(1)} --- \\ --- x^{(2)} --- \\ \vdots \\ --- x^{(m)} --- \end{bmatrix}X=​−−−x(1)−−−−−−x(2)−−−⋮−−−x(m)−−−​​

Dimension:

m×nm \times nm×n

Each row = one training example
Each column = one feature


5. Matrix-Vector Multiplication

If:

  • XXX is m×nm \times nm×n
  • θ\thetaθ is n×1n \times 1n×1

Then:

XθX\thetaXθ

Produces:

m×1m \times 1m×1

This gives predictions for all training examples in one operation.

This is vectorization — much faster than loops.


6. Hypothesis in Matrix Form

Linear regression hypothesis:

hθ(x)=θTxh_\theta(x) = \theta^T xhθ​(x)=θTx

For all training examples:

h=Xθh = X\thetah=Xθ

This is why matrix multiplication is central in ML.


7. Transpose (xT\mathbf{x}^TxT)

Transpose swaps rows and columns.

If:

A∈Rm×nA \in \mathbb{R}^{m \times n}A∈Rm×n

Then:

AT∈Rn×mA^T \in \mathbb{R}^{n \times m}AT∈Rn×m

Element-wise:

(AT)ij=Aji(A^T)_{ij} = A_{ji}(AT)ij​=Aji​

Used heavily in:

  • Normal Equation
  • Gradient derivations

8. Identity Matrix

Identity matrix:

I=[100010001]I = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}I=​100​010​001​​

Property:

AI=IA=AAI = IA = AAI=IA=A

9. Inverse Matrix

Matrix inverse satisfies:

A−1A=AA−1=IA^{-1}A = AA^{-1} = IA−1A=AA−1=I

Used in Normal Equation:

θ=(XTX)−1XTy\theta = (X^T X)^{-1} X^T yθ=(XTX)−1XTy

Important:

  • Not all matrices have inverses
  • If determinant = 0 → matrix is singular
  • Singular matrix → no inverse

10. Determinant (Conceptual View)

The determinant measures:

  • Whether a matrix is invertible
  • Volume scaling factor

If:

det⁡(A)=0\det(A) = 0det(A)=0

Then:

  • Matrix is singular
  • Inverse does not exist

11. Why Vectorization Matters

Instead of:

for each training example: compute prediction

We compute:

XθX\thetaXθ

Benefits:

  • Faster computation
  • Clean mathematical formulation
  • Optimized hardware usage (CPU/GPU)

Modern ML frameworks rely entirely on vectorized operations.


12. Matrix-Matrix Multiplication

If:

  • AAA is m×nm \times nm×n
  • BBB is n×pn \times pn×p

Then:

ABABAB

Results in:

m×pm \times pm×p

Properties:

  • Not commutative: AB≠BAAB \ne BAAB=BA
  • Associative: (AB)C=A(BC)(AB)C = A(BC)(AB)C=A(BC)

13. From Linear Algebra to Deep Learning

Everything in deep learning is matrix multiplication:

  • Inputs × Weights
  • Weights × Activations
  • Gradient updates

Neural network forward pass:

Z=WX+bZ = WX + bZ=WX+b

Backpropagation is also matrix calculus.

Understanding multivariate linear algebra makes deep learning much easier to grasp.


14. Summary

Key ideas:

  • Vectors represent features and parameters
  • Matrices represent datasets
  • Matrix multiplication enables fast prediction
  • Transpose and inverse enable optimization
  • Vectorization is essential for performance

Linear algebra is not optional in ML — it is foundational.


Next in the Series

In the next article, we will explore:

Optimization in Machine Learning: Gradient Descent Variants and Convergence

AI-Machine-Learning/MultiVariant-Linear-Algebra
Let's work together
+49 176-2019-2523
hiteshkrsahu@gmail.com
WhatsApp
Skype
Munich 🥨, Germany 🇩🇪, EU
Playstore
Hitesh Sahu's apps on Google Play Store
Need Help?
Let's Connect
Navigation
  Home/About
  Skills
  Work/Projects
  Lab/Experiments
  Contribution
  Awards
  Art/Sketches
  Thoughts
  Contact
Links
  Sitemap
  Legal Notice
  Privacy Policy

Made with

NextJS logo

NextJS by

hitesh Sahu

| © 2026 All rights reserved.