Hitesh Sahu
Hitesh SahuHitesh Sahu
  1. Home
  2. ›
  3. posts
  4. ›
  5. …

  6. ›
  7. MultiVariant Linear Algebra 2

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🦥 Sloths can hold their breath longer than dolphins 🐬.

🍪 This website uses cookies

No personal data is stored on our servers however third party tools Google Analytics cookies to measure traffic and improve your website experience. Learn more

Cover Image for ️☠️ Advance MultiVariant Linear Algebra 2

️☠️ Advance MultiVariant Linear Algebra 2

Detailed explanation of the Normal Equation for linear regression, including matrix formulation, closed-form solution, comparison with gradient descent, and practical considerations for implementation.

Hitesh Sahu
Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Algebra for Notation and Geometry

Next →

️☠️ Advance MultiVariant Linear Algebra

☠️ Advance MultiVariant Linear Algebra

💀 When Geometry Becomes Dangerous


Orthogonality = Independence

Orthogonality representing perpendicularity, where elements (vectors, functions, or data) are independent and have a dot product or inner product of zero.

Two vectors are orthogonal if:

vTw=0v^T w = 0vTw=0

Or their dot product is zero. ie they form a 90 angle or are perpendicular.

v.w=0v.w =0v.w=0

In ML, orthogonality means:

  • No linear dependence
  • No shared directional information

This is why:

  • PCA finds orthogonal principal components
  • QR decomposition builds orthogonal bases
  • SVD decomposes into orthogonal directions

Orthogonality reduces redundancy.

Eigenvectors

Non-zero vectors associated with a linear transformation (represented by a square matrix) that do not change their direction when that transformation is applied.

Instead, they are only

  • scaled—stretched
  • compressed
  • reversed

by a scalar factor known as the eigenvalue λ\lambdaλ

If:

Av = $\lambda$ v

That means:

  • Applying transformation AAA
  • Does not change the direction of vvv
  • Only scales it by λ\lambdaλ

Eigenvectors are:

  • Directions that remain stable under transformation.
  • Natural Directions of a Transformation

In ML:

  • PCA uses eigenvectors of the covariance matrix
  • These represent directions of maximum variance
  • Each eigenvector = principal axis of data

Eigenvalues tell you how important that direction is.

Singular Value Decomposition (SVD)

Factorization of a real or complex matrix into a rotation, followed by a rescaling followed by another rotation.

SVD states:

X=UΣVTX = U \Sigma V^TX=UΣVT

Geometrically:

  1. VTV^TVT rotates the space
  2. SigmaSigmaSigma scales each axis
  3. UUU rotates again

So any matrix transformation can be seen as:

Rotate → Stretch → Rotate

This is why SVD is foundational in:

  • Dimensionality reduction
  • Embeddings
  • LLM weight compression
  • Recommender systems

Neural Networks as Layered Transformations

Each layer computes:

z=Wx+bz = Wx + bz=Wx+b

Which is:

  1. Linear transformation ( W )
  2. Shift ( b )
  3. Nonlinear activation

Geometrically:

  • Linear layers reshape space
  • Activations bend space
  • Deep networks progressively warp geometry

Training adjusts ( W ) so that:

  • Classes become linearly separable
  • Desired outputs align with target directions

Deep learning is geometry engineering.


A matrix is a transformation of space.

All machine learning models are compositions of transformations.

If:

y=Axy = Axy=Ax

Then AAA transforms vector xxx into a new vector yyy.

Geometrically, a matrix can:

  • Stretch
  • Compress
  • Rotate
  • Reflect
  • Shear
  • Project

Linear Regression = Projection

When solving linear regression:

θ^=(XTX)−1XTy\hat{\theta} = (X^T X)^{-1} X^T yθ^=(XTX)−1XTy

You are not just solving equations.

You are projecting vector ( y ) onto the column space of ( X ).

Meaning:

  • ( X ) defines a subspace (all linear combinations of features)
  • ( y ) may not lie in that space
  • We find the closest point in that space

This closest point is the orthogonal projection.

The residual error is perpendicular to the feature space.

Mathematically:

XT(y−Xθ^)=0X^T (y - X\hat{\theta}) = 0XT(y−Xθ^)=0

Geometric meaning:

The error vector is orthogonal to every feature direction.


AI-Math/MultiVariant-Linear-Algebra-2
Let's work together
+49 176-2019-2523
hiteshkrsahu@gmail.com
WhatsApp
Skype
Munich 🥨, Germany 🇩🇪, EU
Playstore
Hitesh Sahu's apps on Google Play Store
Need Help?
Let's Connect
Navigation
  Home/About
  Skills
  Work/Projects
  Lab/Experiments
  Contribution
  Awards
  Art/Sketches
  Thoughts
  Contact
Links
  Sitemap
  Legal Notice
  Privacy Policy

Made with

NextJS logo

NextJS by

hitesh Sahu

| © 2026 All rights reserved.