️☠️ Advance MultiVariant Linear Algebra
Detailed explanation of the Normal Equation for linear regression, including matrix formulation, closed-form solution, comparison with gradient descent, and practical considerations for implementation.
☠️ Advance MultiVariant Linear Algebra
💀 When Geometry Becomes Dangerous
Linear algebra is the language of space manipulation.
Machine learning is controlled geometric transformation.
Why Linear Algebra Matters in ML
Machine learning deals with:
- Multiple features
- Large datasets
- Efficient computation
- Vectorized operations
Instead of writing nested loops, we use matrix operations to compute predictions and updates efficiently.
This is why multivariate linear algebra is essential.
Think of ML as:
-
Data → points in high-dimensional space
-
Features → axes
-
Models → transformations
-
Training → adjusting geometry
-
Loss → distance between vectors
-
Optimization → walking downhill in space
-
Gradient descent becomes directional movement
-
Regularization becomes shrinking vector norms
-
Overfitting becomes high-dimensional distortion
-
RAG embeddings become spatial similarity
-
Attention becomes weighted projection
Mathematical object
A mathematical object is an abstract concept which can be a value that can be assigned to a symbol, and therefore can be involved in formulas.
- Examples numbers, expressions, shapes, functions, and sets.
- Complex Objects: theorems, proofs.

Vectors()
A vector is an ordered list of numbers.
- Latin: vector , meaning "carrier" or "driver"
It is:
- A direction
- A magnitude
- A point in high-dimensional space
In ML, vectors represent:
- A data point → a vector
- A feature column → a direction
- A model weight vector → a direction of best fit
Column Vector (most common in ML)
A point in n dimensions.
Dimension:
Matrices
A matrix is a 2D array of numbers or table of numbers.
Dimension:
Where:
- = rows
- = columns
Element notation:
means element in row , column .
Tensor
Algebraic object that describes a multilinear relationship between sets of algebraic objects associated with a vector space.
- Latin: tendere meaning 'to stretch'
A matrix is a transformation of space.
All machine learning models are compositions of transformations.
If:
Then transforms vector into a new vector .
Geometrically, a matrix can:
- Stretch
- Compress
- Rotate
- Reflect
- Shear
- Project
Linear Regression = Projection
When solving linear regression:
You are not just solving equations.
You are projecting vector ( y ) onto the column space of ( X ).
Meaning:
- ( X ) defines a subspace (all linear combinations of features)
- ( y ) may not lie in that space
- We find the closest point in that space
This closest point is the orthogonal projection.
The residual error is perpendicular to the feature space.
Mathematically:
Geometric meaning:
The error vector is orthogonal to every feature direction.
Orthogonality = Independence
Orthogonality representing perpendicularity, where elements (vectors, functions, or data) are independent and have a dot product or inner product of zero.
Two vectors are orthogonal if:
Or their dot product is zero. ie they form a 90 angle or are perpendicular.
In ML, orthogonality means:
- No linear dependence
- No shared directional information
This is why:
- PCA finds orthogonal principal components
- QR decomposition builds orthogonal bases
- SVD decomposes into orthogonal directions
Orthogonality reduces redundancy.
Eigenvectors
Non-zero vectors associated with a linear transformation (represented by a square matrix) that do not change their direction when that transformation is applied.
Instead, they are only
- scaled—stretched
- compressed
- reversed
by a scalar factor known as the eigenvalue
If:
Av = $\lambda$ vThat means:
- Applying transformation
- Does not change the direction of
- Only scales it by
Eigenvectors are:
- Directions that remain stable under transformation.
- Natural Directions of a Transformation
In ML:
- PCA uses eigenvectors of the covariance matrix
- These represent directions of maximum variance
- Each eigenvector = principal axis of data
Eigenvalues tell you how important that direction is.
Singular Value Decomposition (SVD)
Factorization of a real or complex matrix into a rotation, followed by a rescaling followed by another rotation.
SVD states:
Geometrically:
- rotates the space
- scales each axis
- rotates again
So any matrix transformation can be seen as:
Rotate → Stretch → Rotate
This is why SVD is foundational in:
- Dimensionality reduction
- Embeddings
- LLM weight compression
- Recommender systems
Neural Networks as Layered Transformations
Each layer computes:
Which is:
- Linear transformation ( W )
- Shift ( b )
- Nonlinear activation
Geometrically:
- Linear layers reshape space
- Activations bend space
- Deep networks progressively warp geometry
Training adjusts ( W ) so that:
- Classes become linearly separable
- Desired outputs align with target directions
Deep learning is geometry engineering.
4. Data Matrix in Machine Learning
If we have:
- training examples
- features
The data matrix is:
Dimension:
Each row = one training example
Each column = one feature
5. Matrix-Vector Multiplication
If:
- is
- is
Then:
Produces:
This gives predictions for all training examples in one operation.
This is vectorization — much faster than loops.
6. Hypothesis in Matrix Form
Linear regression hypothesis:
For all training examples:
This is why matrix multiplication is central in ML.
7. Transpose ()
Transpose swaps rows and columns.
If:
Then:
Element-wise:
Used heavily in:
- Normal Equation
- Gradient derivations
8. Identity Matrix
Identity matrix:
Property:
9. Inverse Matrix
Matrix inverse satisfies:
Used in Normal Equation:
Important:
- Not all matrices have inverses
- If determinant = 0 → matrix is singular
- Singular matrix → no inverse
10. Determinant (Conceptual View)
The determinant measures:
- Whether a matrix is invertible
- Volume scaling factor
If:
Then:
- Matrix is singular
- Inverse does not exist
11. Why Vectorization Matters
Instead of:
for each training example: compute prediction
We compute:
Benefits:
- Faster computation
- Clean mathematical formulation
- Optimized hardware usage (CPU/GPU)
Modern ML frameworks rely entirely on vectorized operations.
12. Matrix-Matrix Multiplication
If:
- is
- is
Then:
Results in:
Properties:
- Not commutative:
- Associative:
13. From Linear Algebra to Deep Learning
Everything in deep learning is matrix multiplication:
- Inputs × Weights
- Weights × Activations
- Gradient updates
Neural network forward pass:
Backpropagation is also matrix calculus.
Understanding multivariate linear algebra makes deep learning much easier to grasp.
14. Summary
Key ideas:
- Vectors represent features and parameters
- Matrices represent datasets
- Matrix multiplication enables fast prediction
- Transpose and inverse enable optimization
- Vectorization is essential for performance
Linear algebra is not optional in ML — it is foundational.
Next in the Series
In the next article, we will explore:
Optimization in Machine Learning: Gradient Descent Variants and Convergence
