️☠️ Advance MultiVariant Linear Algebra

Detailed explanation of the Normal Equation for linear regression, including matrix formulation, closed-form solution, comparison with gradient descent, and practical considerations for implementation.

Written by Hitesh Sahu, a passionate developer and blogger.

Thu Feb 19 2026

Share This on

☠️ Advance MultiVariant Linear Algebra

💀 When Geometry Becomes Dangerous

Linear algebra is the language of space manipulation.

Machine learning is controlled geometric transformation.

Why Linear Algebra Matters in ML

Machine learning deals with:

Multiple features
Large datasets
Efficient computation
Vectorized operations

Instead of writing nested loops, we use matrix operations to compute predictions and updates efficiently.

This is why multivariate linear algebra is essential.

Think of ML as:

Data → points in high-dimensional space
Features → axes
Models → transformations
Training → adjusting geometry
Loss → distance between vectors
Optimization → walking downhill in space
Gradient descent becomes directional movement
Regularization becomes shrinking vector norms
Overfitting becomes high-dimensional distortion
RAG embeddings become spatial similarity
Attention becomes weighted projection

Mathematical object

A mathematical object is an abstract concept which can be a value that can be assigned to a symbol, and therefore can be involved in formulas.

Examples numbers, expressions, shapes, functions, and sets.
Complex Objects: theorems, proofs.

Vectors( $\vec{x}$ )

A vector is an ordered list of numbers.

Latin: vector , meaning "carrier" or "driver"

It is:

A direction
A magnitude
A point in high-dimensional space

In ML, vectors represent:

A data point → a vector
A feature column → a direction
A model weight vector → a direction of best fit

Column Vector (most common in ML)

A point in n dimensions.

x = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}

Dimension: $n \times 1$

Matrices

A matrix is a 2D array of numbers or table of numbers.

Dimension:

m \times n

Where:

$m$ = rows
$n$ = columns

Element notation:

A_{ij}

means element in row $i$ , column $j$ .

Tensor

Algebraic object that describes a multilinear relationship between sets of algebraic objects associated with a vector space.

Latin: tendere meaning 'to stretch'

A matrix is a transformation of space.

All machine learning models are compositions of transformations.

If:

y = Ax

Then $A$ transforms vector $x$ into a new vector $y$ .

Geometrically, a matrix can:

Stretch
Compress
Rotate
Reflect
Shear
Project

Linear Regression = Projection

When solving linear regression:

\hat{\theta} = (X^T X)^{-1} X^T y

You are not just solving equations.

You are projecting vector ( y ) onto the column space of ( X ).

Meaning:

( X ) defines a subspace (all linear combinations of features)
( y ) may not lie in that space
We find the closest point in that space

This closest point is the orthogonal projection.

The residual error is perpendicular to the feature space.

Mathematically:

X^T (y - X\hat{\theta}) = 0

Geometric meaning:

The error vector is orthogonal to every feature direction.

Orthogonality = Independence

Orthogonality representing perpendicularity, where elements (vectors, functions, or data) are independent and have a dot product or inner product of zero.

Two vectors are orthogonal if:

v^T w = 0

Or their dot product is zero. ie they form a 90 angle or are perpendicular.

v.w =0

In ML, orthogonality means:

No linear dependence
No shared directional information

This is why:

PCA finds orthogonal principal components
QR decomposition builds orthogonal bases
SVD decomposes into orthogonal directions

Orthogonality reduces redundancy.

Eigenvectors

Non-zero vectors associated with a linear transformation (represented by a square matrix) that do not change their direction when that transformation is applied.

Instead, they are only

scaled—stretched
compressed
reversed

by a scalar factor known as the eigenvalue $\lambda$

If:

Av = $\lambda$ v

That means:

Applying transformation $A$
Does not change the direction of $v$
Only scales it by $\lambda$

Eigenvectors are:

Directions that remain stable under transformation.
Natural Directions of a Transformation

In ML:

PCA uses eigenvectors of the covariance matrix
These represent directions of maximum variance
Each eigenvector = principal axis of data

Eigenvalues tell you how important that direction is.

Singular Value Decomposition (SVD)

Factorization of a real or complex matrix into a rotation, followed by a rescaling followed by another rotation.

SVD states:

X = U \Sigma V^T

Geometrically:

$V^T$ rotates the space
$Sigma$ scales each axis
$U$ rotates again

So any matrix transformation can be seen as:

Rotate → Stretch → Rotate

This is why SVD is foundational in:

Dimensionality reduction
Embeddings
LLM weight compression
Recommender systems

Neural Networks as Layered Transformations

Each layer computes:

z = Wx + b

Which is:

Linear transformation ( W )
Shift ( b )
Nonlinear activation

Geometrically:

Linear layers reshape space
Activations bend space
Deep networks progressively warp geometry

Training adjusts ( W ) so that:

Classes become linearly separable
Desired outputs align with target directions

Deep learning is geometry engineering.

4. Data Matrix in Machine Learning

If we have:

$m$ training examples
$n$ features

The data matrix is:

X = \begin{bmatrix} --- x^{(1)} --- \\ --- x^{(2)} --- \\ \vdots \\ --- x^{(m)} --- \end{bmatrix}

Dimension:

m \times n

Each row = one training example
Each column = one feature

5. Matrix-Vector Multiplication

If:

$X$ is $m \times n$
$\theta$ is $n \times 1$

Then:

X\theta

Produces:

m \times 1

This gives predictions for all training examples in one operation.

This is vectorization — much faster than loops.

6. Hypothesis in Matrix Form

Linear regression hypothesis:

h_\theta(x) = \theta^T x

For all training examples:

h = X\theta

This is why matrix multiplication is central in ML.

7. Transpose ( $\mathbf{x}^T$ )

Transpose swaps rows and columns.

If:

A \in \mathbb{R}^{m \times n}

Then:

A^T \in \mathbb{R}^{n \times m}

Element-wise:

(A^T)_{ij} = A_{ji}

Used heavily in:

Normal Equation
Gradient derivations

8. Identity Matrix

Identity matrix:

I = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}

Property:

AI = IA = A

9. Inverse Matrix

Matrix inverse satisfies:

A^{-1}A = AA^{-1} = I

Used in Normal Equation:

\theta = (X^T X)^{-1} X^T y

Important:

Not all matrices have inverses
If determinant = 0 → matrix is singular
Singular matrix → no inverse

10. Determinant (Conceptual View)

The determinant measures:

Whether a matrix is invertible
Volume scaling factor

If:

\det(A) = 0

Then:

Matrix is singular
Inverse does not exist

11. Why Vectorization Matters

Instead of:

for each training example: compute prediction

We compute:

X\theta

Benefits:

Faster computation
Clean mathematical formulation
Optimized hardware usage (CPU/GPU)

Modern ML frameworks rely entirely on vectorized operations.

12. Matrix-Matrix Multiplication

If:

$A$ is $m \times n$
$B$ is $n \times p$

Then:

AB

Results in:

m \times p

Properties:

Not commutative: $AB \ne BA$
Associative: $(AB)C = A(BC)$

13. From Linear Algebra to Deep Learning

Everything in deep learning is matrix multiplication:

Inputs × Weights
Weights × Activations
Gradient updates

Neural network forward pass:

Z = WX + b

Backpropagation is also matrix calculus.

Understanding multivariate linear algebra makes deep learning much easier to grasp.

14. Summary

Key ideas:

Vectors represent features and parameters
Matrices represent datasets
Matrix multiplication enables fast prediction
Transpose and inverse enable optimization
Vectorization is essential for performance

Linear algebra is not optional in ML — it is foundational.

Next in the Series

In the next article, we will explore:

Optimization in Machine Learning: Gradient Descent Variants and Convergence

️☠️ Advance MultiVariant Linear Algebra

Detailed explanation of the Normal Equation for linear regression, including matrix formulation, closed-form solution, comparison with gradient descent, and practical considerations for implementation.

Written by Hitesh Sahu, a passionate developer and blogger.

Thu Feb 19 2026

Share This on

☠️ Advance MultiVariant Linear Algebra

💀 When Geometry Becomes Dangerous

Linear algebra is the language of space manipulation.

Machine learning is controlled geometric transformation.