Hitesh Sahu
Hitesh SahuHitesh Sahu
  1. Home
  2. ›
  3. posts
  4. ›
  5. …

  6. ›
  7. MultiVariant Linear Algebra

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🦈 Sharks existed before trees 🌳.

🍪 This website uses cookies

No personal data is stored on our servers however third party tools Google Analytics cookies to measure traffic and improve your website experience. Learn more

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🍯 Honey never spoils — archaeologists found 3,000-year-old jars still edible.
Cover Image for ️☠️ Advance MultiVariant Linear Algebra

️☠️ Advance MultiVariant Linear Algebra

Detailed explanation of the Normal Equation for linear regression, including matrix formulation, closed-form solution, comparison with gradient descent, and practical considerations for implementation.

Hitesh Sahu
Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

️☠️ Advance MultiVariant Linear Algebra 2

Next →

NVIDIA Infra Devs Certification Path

Mathematical object

A mathematical object is an abstract concept which can be a value that can be assigned to a symbol, and therefore can be involved in formulas.

  • Examples numbers, expressions, shapes, functions, and sets.
  • Complex Objects: theorems, proofs.

Tensor

Algebraic object that describes a multilinear relationship between sets of algebraic objects associated with a vector space.

  • Latin: tendere meaning 'to stretch'

Scaler(s∈Rs \in \mathbb{R}s∈R)

Scalars are real numbers used in linear algebra

  • A single number, a 0-dimensional tensor.
  • Example: 555, −3.14-3.14−3.14, π\piπ, eee

Matrices(A∈Rn×mA \in \mathbb{R}^{n \times m}A∈Rn×m)

A matrix is a 2D array of numbers or table of numbers.

A=[8576665947518286840715]A = \begin{bmatrix} 85 & 76 & 66 & 5 \\ 94 & 75 & 18 & 28 \\ 68 & 40 & 71 & 5 \end{bmatrix}A=​859468​767540​661871​5285​​

Representation:

In mathematics:

A∈R3×4A \in \mathbb{R}^{3 \times 4}A∈R3×4

  • Where AAA is a real-valued matrix with 4 rows and 2 columns.

In programming:

A = np.array([[85, 76, 66, 5],
              [94, 75, 18, 28],
              [68, 40, 71, 5]])

In theory:

  • Uppercase letters (A, B, X) → Matrices
  • Lowercase letters (x, y, z) → Vectors or scalars

Dimension:

(m×n)(m \times n)(m×n)

Where:

  • mmm = rows
  • nnn = columns
  • Example 3×43\times43×4 matrix in the example

Square Matrix

  • A matrix with the same number of rows and columns (m=nm = nm=n).

Element notation:

AijA_{ij}Aij​

  • The element in the i−thi-thi−th row and j−thj-thj−th column.

Example:

  • A11A_{11}A11​ = 85 → Row 1, Column 1
  • A32=40A_{32} = 40A32​=40 → Row 3, Column 2
  • A41=5A_{41} = 5A41​=5 → Row 4, Column 1
  • A23=18A_{23} = 18A23​=18 → Row 2, Column 3
  • A64=undefinedA_{64} = undefinedA64​=undefined → 6th row, 4th column does not exist

Use in Machine Learning

Represents Data Matrix, Model Parameters, Transformations

If we have:

  • mmm training examples
  • nnn features

The data matrix is:

X=[−−−x(1)−−−−−−x(2)−−−⋮−−−x(m)−−−]X = \begin{bmatrix} --- x^{(1)} --- \\ --- x^{(2)} --- \\ \vdots \\ --- x^{(m)} --- \end{bmatrix}X=​−−−x(1)−−−−−−x(2)−−−⋮−−−x(m)−−−​​

Dimension m×nm \times nm×n where:

  • Each row = one training example
  • Each column = one feature

Vectors(x⃗\vec{x}x)

A vector is a Matrix with 1 Column

  • Represents A point in nnn high-dimensional space
  • Latin: vector , meaning "carrier" or "driver"
  • Have A direction (x⃗\vec{x}x) & A magnitude

x=[x1x2⋮xn]x = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}x=​x1​x2​⋮xn​​​

Represent as:

In Maths

y∈R4y \in \mathbb{R}^4y∈R4

In Programming

y = np.array([460, 232, 315, 178])

In Theory:

  • Uppercase letters (A, B, X) → Matrices

Dimension: n×1n \times 1n×1

In ML, vectors represent:

  • A data point → a vector
  • A feature column → a direction
  • A model weight vector → a direction of best fit

Example

y=[460232315178]y = \begin{bmatrix} 460 \\ 232 \\ 315 \\ 178 \end{bmatrix}y=​460232315178​​

  • 4 × 1 matrix Or a 4-dimensional vector

Element Indexing

yiy_iyi​ = i-th element.

  • In mathematics, indexing usually starts at 1.
  • In programming indexing often starts at 0.
  • Unless otherwise specified, assume one-indexed notation in linear algebra.

Example:

  • y1=460y_1 = 460y1​=460
  • y2=232y_2 = 232y2​=232
  • y3=315y_3 = 315y3​=315
  • y4=178y_4 = 178y4​=178

Transpose (xT\mathbf{x}^TxT)

Transpose swaps rows and columns.

If: A∈Rm×nA \in \mathbb{R}^{m \times n}A∈Rm×n then: AT∈Rn×mA^T \in \mathbb{R}^{n \times m}AT∈Rn×m

Element-wise:

(AT)ij=Aji(A^T)_{ij} = A_{ji}(AT)ij​=Aji​

  • A column vector becomes a row vector.

Given:

A=[123456]A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix}A=[14​25​36​]

Then:

AT=[142536]A^T = \begin{bmatrix} 1 & 4 \\ 2 & 5 \\ 3 & 6 \end{bmatrix}AT=​123​456​​

Used heavily in:

  • Normal Equation
  • Gradient derivations

Identity Matrix (III)

The identity matrix is the matrix equivalent of the number 1.

It is a square matrix with:

  • 1’s on the diagonal
  • 0’s everywhere else
I=[100010001]I = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}I=​100​010​001​​

Property:

AI=IA=AAI = IA = AAI=IA=A

Inverse Matrix(A−1A^{-1}A−1)

The inverse of a matrix is like division.

  • Only square matrices can have inverses.

Matrix inverse satisfies:

A−1A=AA−1=IA^{-1}A = AA^{-1} = IA−1A=AA−1=I

Used in Normal Equation:

θ=(XTX)−1XTy\theta = (X^T X)^{-1} X^T yθ=(XTX)−1XTy

Not all square matrices are invertible.

1. Invertible/ non-singular Matrix

A matrix can be inverted

  • it has an inverse if it is full rank (rows and columns are linearly independent).

2. Non-Invertible/ Singular Matrix/ Degenerate Matrix

A matrix that does not have an inverse

  • Does not have a inverse because it is not full rank (rows or columns are linearly dependent).
Cause for non invertible Matrix:
  • Redundant feature: two feature related by a linear equation x2 = kx1 eg: size in feet and meter
  • More feature than training set(m<=n)): delete some feature or use regularization
Octave method for inverting matrix:
  • pinv(A) : Pseudo Inverse, calculates inverse even if matrix is non invertible
  • inv(A) : Inverse

Determinant ( det(A)det(A)det(A))

The determinant tells us whether a matrix is invertible.

For a 2 × 2 matrix:

A=[abcd]A = \begin{bmatrix} a & b \\ c & d \end{bmatrix}A=[ac​bd​] det⁡(A)=ad−bc\det(A) = ad - bcdet(A)=ad−bc

If: det(A)≠0det(A) \neq 0det(A)=0: The matrix is invertible.

If: det⁡(A)=0\det(A) = 0det(A)=0 : The matrix is singular (not invertible).

  • Either no solution
  • Or infinitely many solution

Use in Machine Learning:

  • Normal Equation requires matrix inversion.

Closed-form solution:

θ=(XTX)−1XTy\theta = (X^T X)^{-1} X^T yθ=(XTX)−1XTy
  • In practice, we use numerical methods to avoid instability of matrix inversion.
  • Regularization can help make matrices invertible by adding a small value to the diagonal (Ridge Regression).

A matrix is a transformation of space.

All machine learning models are compositions of transformations.

If:

y=Axy = Axy=Ax

Then AAA transforms vector xxx into a new vector yyy.

Geometrically, a matrix can:

  • Stretch
  • Compress
  • Rotate
  • Reflect
  • Shear
  • Project

Matrix Addition/Subtraction

When Is Addition Allowed?

Addition is done element by element.

Two matrices can be added only if they have the same dimensions.

If:

A,B∈Rm×nA, B \in \mathbb{R}^{m \times n}A,B∈Rm×n

then:

C=A+BC = A + BC=A+B where (A+B)ij=Aij+Bij(A + B)_{ij} = A_{ij} + B_{ij}(A+B)ij​=Aij​+Bij​

is also an (m×n( m \times n (m×n) matrix.

A=[1234]B=[5678]A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \quad B = \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix}A=[13​24​]B=[57​68​] A+B=[1+52+63+74+8]=[681012]A + B = \begin{bmatrix} 1+5 & 2+6 \\ 3+7 & 4+8 \end{bmatrix} = \begin{bmatrix} 6 & 8 \\ 10 & 12 \end{bmatrix}A+B=[1+53+7​2+64+8​]=[610​812​]

Subtraction is the same but with minus signs.


Scalar Multiplication / Division

Scalar multiplication is multiplying every element of a matrix by a single number (scalar).


Matrix-Matrix Multiplication

Element-wise: Cij=AikBkjC_{ij} = A_{ik}B_{kj}Cij​=Aik​Bkj​ (sum over k)

Given 2 Matrices:

  • AAA is (m×n)(m \times n)(m×n) or A∈Rm×nA \in \mathbb{R}^{m \times n}A∈Rm×n
  • BBB is (n×p)(n \times p)(n×p) or B∈Rn×pB \in \mathbb{R}^{n \times p}B∈Rn×p

Then:

C=ABC = ABC=AB

where CCC is a new matrix with dimensions:

  • C(m×p)C(m \times p)C(m×p) or C∈Rm×pC\in \mathbb{R}^{m \times p}C∈Rm×p
  • inner dimensions must match (n) : (m×n)(n×p)→(m×p)(m \times n)(n \times p) \rightarrow (m \times p)(m×n)(n×p)→(m×p)

Properties:

  • Not commutative: AB≠BAAB \ne BAAB=BA: Order matters
  • Associative: (AB)C=A(BC)(AB)C = A(BC)(AB)C=A(BC)

Use in Machine Learning

Everything in deep learning is matrix multiplication:

  • Inputs × Weights
  • Weights × Activations
  • Gradient updates

Neural network forward pass: Z=WX+bZ = WX + bZ=WX+b

Backpropagation is also matrix calculus.

Understanding multivariate linear algebra makes deep learning much easier to grasp.


Vectorization: Matrix-Vector Multiplication

If:

  • XXX is m×nm \times nm×n Matrix
  • θ\thetaθ is n×1n \times 1n×1 Vector

Then

h=Xθh = X\thetah=Xθ
  • Produces h(m×1)h (m \times 1)h(m×1) Vector

Use in Machine Learning

  • This gives predictions for all training examples in one operation.
  • Faster computation: optimized hardware usage (CPU/GPU)
  • Clean mathematical formulation

Linear regression hypothesis:

hθ(x)=θTxh_\theta(x) = \theta^T xhθ​(x)=θTx

For all training examples:

h=Xθh = X\thetah=Xθ

Dot Product(a.ba.ba.b)

The dot product is defined between two vectors of the same dimension.

If:

x,y∈Rnx, y \in \mathbb{R}^nx,y∈Rn

Then their dot product is:

xTyx^T yxTy x=[x1x2⋮xn]y=[y1y2⋮yn]x = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} \quad y = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix}x=​x1​x2​⋮xn​​​y=​y1​y2​⋮yn​​​ xTy=x1y1+x2y2+⋯+xnynx^T y = x_1 y_1 + x_2 y_2 + \dots + x_n y_nxTy=x1​y1​+x2​y2​+⋯+xn​yn​

It produces a single number (a scalar).


Summary

Key ideas:

  • Vectors represent features and parameters
  • Matrices represent datasets
  • Matrix multiplication enables fast prediction
  • Transpose and inverse enable optimization
  • Vectorization is essential for performance
AI-Math/MultiVariant-Linear-Algebra
Let's work together
+49 176-2019-2523
hiteshkrsahu@gmail.com
WhatsApp
Skype
Munich 🥨, Germany 🇩🇪, EU
Playstore
Hitesh Sahu's apps on Google Play Store
Need Help?
Let's Connect
Navigation
  Home/About
  Skills
  Work/Projects
  Lab/Experiments
  Contribution
  Awards
  Art/Sketches
  Thoughts
  Contact
Links
  Sitemap
  Legal Notice
  Privacy Policy

Made with

NextJS logo

NextJS by

hitesh Sahu

| © 2026 All rights reserved.