Hitesh Sahu
Hitesh SahuHitesh Sahu
  1. Home
  2. ›
  3. posts
  4. ›
  5. …

  6. ›
  7. 1 1 Algebra

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🦈 Sharks existed before trees 🌳.

🍪 This website uses cookies

No personal data is stored on our servers however third party tools Google Analytics cookies to measure traffic and improve your website experience. Learn more

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🦥 Sloths can hold their breath longer than dolphins 🐬.
AI-Math

    AI-AgenticAI

    AI-DeepLearning

    AI-GenAI

    AI-Infrastructure

    AI-Machine-Learning

    AI-Math
    • Advance Maths for Machine Learning

    • Algebra for Notation and Geometry

    • ️Advance MultiVariant Linear Algebra

    • MATLAB Fundamentals

    • MATLAB Operators

    • MATLAB Control Flow & Logic

    • MATLAB Object-Oriented Programming (OOP)

    • MATLAB Plotting & Visualization

    • AI-Math Index


    AWS

    Azure

    Hobbies

    kubernetes

    Management

    Programming

    Terraform

    Z_Appendix

    0-root

Cover Image for Algebra for Notation and Geometry
AI-Math

Algebra for Notation and Geometry

Brief overview of matrix and vector notation, including size, transpose, inverse, determinant, multiplication, sets of numbers and vectors, vector norms, and transformations in the context of machine learning.

Linear Algebra
Machine Learning
Multivariate Linear Algebra
Vectors
Matrices
Geometry
← Previous

Advance Maths for Machine Learning

Next →

️Advance MultiVariant Linear Algebra

Linear Algebra

Linear algebra is the language of space manipulation.

Machine learning is controlled geometric transformation.

Why Linear Algebra Matters in ML

Machine learning deals with:

  • Multiple features
  • Large datasets
  • Efficient computation
  • Vectorized operations

Instead of writing nested loops, we use matrix operations to compute predictions and updates efficiently.

This is why multivariate linear algebra is essential.

Think of ML as:

  • Data → points in high-dimensional space

  • Features → axes

  • Models → transformations

  • Training → adjusting geometry

  • Loss → distance between vectors

  • Optimization → walking downhill in space

  • Gradient descent becomes directional movement

  • Regularization becomes shrinking vector norms

  • Overfitting becomes high-dimensional distortion

  • RAG embeddings become spatial similarity

  • Attention becomes weighted projection

Mathematical object

A mathematical object is an abstract concept which can be a value that can be assigned to a symbol, and therefore can be involved in formulas.

  • Examples numbers, expressions, shapes, functions, and sets.
  • Complex Objects: theorems, proofs.

Tensor

Algebraic object that describes a multilinear relationship between sets of algebraic objects associated with a vector space.

  • Latin: tendere meaning 'to stretch'

Scaler(s∈Rs \in \mathbb{R}s∈R)

Scalars are real numbers used in linear algebra

  • A single number, a 0-dimensional tensor.
  • Example: 555, −3.14-3.14−3.14, π\piπ, eee

Matrices(A∈Rn×mA \in \mathbb{R}^{n \times m}A∈Rn×m)

A matrix is a 2D array of numbers or table of numbers.

A=[8576665947518286840715]A = \begin{bmatrix} 85 & 76 & 66 & 5 \\ 94 & 75 & 18 & 28 \\ 68 & 40 & 71 & 5 \end{bmatrix}A=​859468​767540​661871​5285​​

Representation:

In mathematics:

A∈R3×4A \in \mathbb{R}^{3 \times 4}A∈R3×4

  • Where AAA is a real-valued matrix with 4 rows and 2 columns.

In programming:

A = np.array([[85, 76, 66, 5],
              [94, 75, 18, 28],
              [68, 40, 71, 5]])

In theory:

  • Uppercase letters (A, B, X) → Matrices
  • Lowercase letters (x, y, z) → Vectors or scalars

Matrix Size

  • A ∈ ℝᵐˣⁿ or A (m × n)
    → Matrix A has m rows and n columns

Example:
If A is 3 × 2, it has 3 rows and 2 columns.

Dimension:

(m×n)(m \times n)(m×n)

Where:

  • mmm = rows
  • nnn = columns
  • Example 3×43\times43×4 matrix in the example

Square Matrix

  • A matrix with the same number of rows and columns (m=nm = nm=n).

Element notation:

AijA_{ij}Aij​

  • The element in the i−thi-thi−th row and j−thj-thj−th column.

Example:

  • A11A_{11}A11​ = 85 → Row 1, Column 1
  • A32=40A_{32} = 40A32​=40 → Row 3, Column 2
  • A41=5A_{41} = 5A41​=5 → Row 4, Column 1
  • A23=18A_{23} = 18A23​=18 → Row 2, Column 3
  • A64=undefinedA_{64} = undefinedA64​=undefined → 6th row, 4th column does not exist

Use in Machine Learning

Represents Data Matrix, Model Parameters, Transformations

If we have:

  • mmm training examples
  • nnn features

The data matrix is:

X=[−−−x(1)−−−−−−x(2)−−−⋮−−−x(m)−−−]X = \begin{bmatrix} --- x^{(1)} --- \\ --- x^{(2)} --- \\ \vdots \\ --- x^{(m)} --- \end{bmatrix}X=​−−−x(1)−−−−−−x(2)−−−⋮−−−x(m)−−−​​

Dimension m×nm \times nm×n where:

  • Each row = one training example
  • Each column = one feature

Vectors(x⃗\vec{x}x)

A vector is a Matrix with 1 Column

  • Represents A point in nnn high-dimensional space
  • Latin: vector , meaning "carrier" or "driver"
  • Have A direction (x⃗\vec{x}x) & A magnitude

x=[x1x2⋮xn]x = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}x=​x1​x2​⋮xn​​​

Represent as:

In Maths

y∈R4y \in \mathbb{R}^4y∈R4

  • ℝ → Set of real numbers
    Example: 0, −0.642, 2, 3.456

  • ℝ² → Set of 2-dimensional vectors

Example:

v=[13]v = \begin{bmatrix} 1 \\ 3 \end{bmatrix}v=[13​]
  • ℝⁿ → Set of n-dimensional vectors

  • v ∈ ℝ² → Vector v belongs to ℝ²

In Programming

y = np.array([460, 232, 315, 178])

In Theory:

  • Uppercase letters (A, B, X) → Matrices

Dimension: n×1n \times 1n×1

In ML, vectors represent:

  • A data point → a vector
  • A feature column → a direction
  • A model weight vector → a direction of best fit

Example

y=[460232315178]y = \begin{bmatrix} 460 \\ 232 \\ 315 \\ 178 \end{bmatrix}y=​460232315178​​

  • 4 × 1 matrix Or a 4-dimensional vector

Element Indexing

yiy_iyi​ = i-th element.

  • In mathematics, indexing usually starts at 1.
  • In programming indexing often starts at 0.
  • Unless otherwise specified, assume one-indexed notation in linear algebra.

Example:

  • y1=460y_1 = 460y1​=460
  • y2=232y_2 = 232y2​=232
  • y3=315y_3 = 315y3​=315
  • y4=178y_4 = 178y4​=178

🔹 Vector Norms

  • ‖v‖₁ → L1 norm
∥v∥1=∑∣vi∣\|v\|_1 = \sum |v_i|∥v∥1​=∑∣vi​∣
  • ‖v‖₂, ‖v‖ → L2 norm (Euclidean norm)
∥v∥2=∑vi2\|v\|_2 = \sqrt{\sum v_i^2}∥v∥2​=∑vi2​​

Transpose (xT\mathbf{x}^TxT)

Transpose swaps rows and columns.

  • Aᵀ → Transpose of matrix A
  • vᵀ → Transpose of vector v

Transpose flips rows into columns.

If: A∈Rm×nA \in \mathbb{R}^{m \times n}A∈Rm×n then: AT∈Rn×mA^T \in \mathbb{R}^{n \times m}AT∈Rn×m

Element-wise:

(AT)ij=Aji(A^T)_{ij} = A_{ji}(AT)ij​=Aji​

  • A column vector becomes a row vector.

Given:

A=[123456]A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix}A=[14​25​36​]

Then:

AT=[142536]A^T = \begin{bmatrix} 1 & 4 \\ 2 & 5 \\ 3 & 6 \end{bmatrix}AT=​123​456​​

Used heavily in:

  • Normal Equation
  • Gradient derivations

Identity Matrix (III)

The identity matrix is the matrix equivalent of the number 1.

It is a square matrix with:

  • 1’s on the diagonal
  • 0’s everywhere else
I=[100010001]I = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}I=​100​010​001​​

Property:

AI=IA=AAI = IA = AAI=IA=A

Inverse Matrix(A−1A^{-1}A−1)

The inverse of a matrix is like division.

  • Only square matrices can have inverses.

Matrix inverse satisfies:

A−1A=AA−1=IA^{-1}A = AA^{-1} = IA−1A=AA−1=I

Used in Normal Equation:

θ=(XTX)−1XTy\theta = (X^T X)^{-1} X^T yθ=(XTX)−1XTy

Not all square matrices are invertible.

1. Invertible/ non-singular Matrix

A matrix can be inverted

  • it has an inverse if it is full rank (rows and columns are linearly independent).

2. Non-Invertible/ Singular Matrix/ Degenerate Matrix

A matrix that does not have an inverse

  • Does not have a inverse because it is not full rank (rows or columns are linearly dependent).
Cause for non invertible Matrix:
  • Redundant feature: two feature related by a linear equation x2 = kx1 eg: size in feet and meter
  • More feature than training set(m<=n)): delete some feature or use regularization
Octave method for inverting matrix:
  • pinv(A) : Pseudo Inverse, calculates inverse even if matrix is non invertible
  • inv(A) : Inverse

Determinant ( det(A)det(A)det(A))

The determinant tells us whether a matrix is invertible.

For a 2 × 2 matrix:

A=[abcd]A = \begin{bmatrix} a & b \\ c & d \end{bmatrix}A=[ac​bd​] det⁡(A)=ad−bc\det(A) = ad - bcdet(A)=ad−bc

If: det(A)≠0det(A) \neq 0det(A)=0: The matrix is invertible.

If: det⁡(A)=0\det(A) = 0det(A)=0 : The matrix is singular (not invertible).

  • Either no solution
  • Or infinitely many solution

Use in Machine Learning:

  • Normal Equation requires matrix inversion.

Closed-form solution:

θ=(XTX)−1XTy\theta = (X^T X)^{-1} X^T yθ=(XTX)−1XTy
  • In practice, we use numerical methods to avoid instability of matrix inversion.
  • Regularization can help make matrices invertible by adding a small value to the diagonal (Ridge Regression).

A matrix is a transformation of space.

All machine learning models are compositions of transformations.

If:

y=Axy = Axy=Ax

Then AAA transforms vector xxx into a new vector yyy.

🔹 Transformations

  • T : ℝ² → ℝ³
    → T maps vectors from 2D space to 3D space

  • T(v) = w
    → Vector v ∈ ℝ² is transformed into w ∈ ℝ³

Geometrically, a matrix can:

  • Stretch
  • Compress
  • Rotate
  • Reflect
  • Shear
  • Project

Matrix Addition/Subtraction

When Is Addition Allowed?

Addition is done element by element.

Two matrices can be added only if they have the same dimensions.

If:

A,B∈Rm×nA, B \in \mathbb{R}^{m \times n}A,B∈Rm×n

then:

C=A+BC = A + BC=A+B where (A+B)ij=Aij+Bij(A + B)_{ij} = A_{ij} + B_{ij}(A+B)ij​=Aij​+Bij​

is also an (m×n( m \times n (m×n) matrix.

A=[1234]B=[5678]A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \quad B = \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix}A=[13​24​]B=[57​68​] A+B=[1+52+63+74+8]=[681012]A + B = \begin{bmatrix} 1+5 & 2+6 \\ 3+7 & 4+8 \end{bmatrix} = \begin{bmatrix} 6 & 8 \\ 10 & 12 \end{bmatrix}A+B=[1+53+7​2+64+8​]=[610​812​]

Subtraction is the same but with minus signs.


Scalar Multiplication / Division

Scalar multiplication is multiplying every element of a matrix by a single number (scalar).


Matrix-Matrix Multiplication

Element-wise: Cij=AikBkjC_{ij} = A_{ik}B_{kj}Cij​=Aik​Bkj​ (sum over k)

  • AB → Matrix multiplication of A and B
    (Valid only if inner dimensions match)

Given 2 Matrices:

  • AAA is (m×n)(m \times n)(m×n) or A∈Rm×nA \in \mathbb{R}^{m \times n}A∈Rm×n
  • BBB is (n×p)(n \times p)(n×p) or B∈Rn×pB \in \mathbb{R}^{n \times p}B∈Rn×p

Then:

C=ABC = ABC=AB

where CCC is a new matrix with dimensions:

  • C(m×p)C(m \times p)C(m×p) or C∈Rm×pC\in \mathbb{R}^{m \times p}C∈Rm×p
  • inner dimensions must match (n) : (m×n)(n×p)→(m×p)(m \times n)(n \times p) \rightarrow (m \times p)(m×n)(n×p)→(m×p)

Properties:

  • Not commutative: AB≠BAAB \ne BAAB=BA: Order matters
  • Associative: (AB)C=A(BC)(AB)C = A(BC)(AB)C=A(BC)

Use in Machine Learning

Everything in deep learning is matrix multiplication:

  • Inputs × Weights
  • Weights × Activations
  • Gradient updates

Neural network forward pass: Z=WX+bZ = WX + bZ=WX+b

Backpropagation is also matrix calculus.

Understanding multivariate linear algebra makes deep learning much easier to grasp.


Vectorization: Matrix-Vector Multiplication

If:

  • XXX is m×nm \times nm×n Matrix
  • θ\thetaθ is n×1n \times 1n×1 Vector

Then

h=Xθh = X\thetah=Xθ
  • Produces h(m×1)h (m \times 1)h(m×1) Vector

Use in Machine Learning

  • This gives predictions for all training examples in one operation.
  • Faster computation: optimized hardware usage (CPU/GPU)
  • Clean mathematical formulation

Linear regression hypothesis:

hθ(x)=θTxh_\theta(x) = \theta^T xhθ​(x)=θTx

For all training examples:

h=Xθh = X\thetah=Xθ

Dot Product(a.ba.ba.b)

The dot product is defined between two vectors of the same dimension.

If:

x,y∈Rnx, y \in \mathbb{R}^nx,y∈Rn

Then their dot product is:

xTyx^T yxTy x=[x1x2⋮xn]y=[y1y2⋮yn]x = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} \quad y = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix}x=​x1​x2​⋮xn​​​y=​y1​y2​⋮yn​​​ xTy=x1y1+x2y2+⋯+xnynx^T y = x_1 y_1 + x_2 y_2 + \dots + x_n y_nxTy=x1​y1​+x2​y2​+⋯+xn​yn​

It produces a single number (a scalar).

  • u · v or ⟨u, v⟩ → Dot product of vectors

Dot product formula:

u⋅v=∑i=1nuiviu \cdot v = \sum_{i=1}^{n} u_i v_iu⋅v=i=1∑n​ui​vi​

Summary

Key ideas:

  • Vectors represent features and parameters
  • Matrices represent datasets
  • Matrix multiplication enables fast prediction
  • Transpose and inverse enable optimization
  • Vectorization is essential for performance
Hitesh Sahu
Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Advance Maths for Machine Learning

Next →

️Advance MultiVariant Linear Algebra

AI-Math/1-1-Algebra
Let's work together
+49 176-2019-2523
hiteshkrsahu@gmail.com
WhatsApp
Skype
Munich 🥨, Germany 🇩🇪, EU
Playstore
Hitesh Sahu's apps on Google Play Store
Need Help?
Let's Connect
Navigation
  Home/About
  Skills
  Work/Projects
  Lab/Experiments
  Contribution
  Awards
  Art/Sketches
  Thoughts
  Contact
Links
  Sitemap
  Legal Notice
  Privacy Policy

Made with

NextJS logo

NextJS by

hitesh Sahu

| © 2026 All rights reserved.