Multivariate Linear Regression: Concepts and Implementation

Comprehensive guide to multivariate linear regression, covering multiple input features, model formulation, assumptions, cost function, gradient descent optimization, and evaluation techniques.

Written by Hitesh Sahu, a passionate developer and blogger.

Thu Feb 19 2026

Share This on

Multi Variant Linear Regression

Linear regression with multiple variables is also known as "multivariate linear regression".

Suppose y is function of multiple variables x then

X_column^(row) : X(i, j)

m = number of training example

n = number of feature in training example

x⁽ⁱ⁾ = i^th row in training example/ feature in example

x_j⁽ⁱ⁾ = i^th row , j^th column in training example

Linear Regression for Hypothesis for n features :

h_Θ(x) = Θ₀X0 +Θ₁X1 +Θ₁ X2+ Θ₁ X3+ ...Θ_nXn

Assume X0= 1 for convenience

h_Θ(x) = Θ₀X0 +Θ₁X1 +Θ₁ X2+ Θ₁ X3+ ...Θ_nXn

h_Θ(x) = Θ^TX

Inner Product of RowVector0(1n) Column VectorX(n1) = Scalar()11 Hypothesis

Notes:

Θ^T is a 1 by (n+1) matrix and not an (n+1) by 1 matrix
x₀⁽ⁱ⁾ =1 for matrix multiplication

Simple Linear Regression

Cost function:

CostFunction

   J(Theta0, Theta1) =  1/2m(Sum((predicted-actual)**2))              
                     =  1/2m(Sum((h(xi)-y(i))**2)) 
                     =  1/2number of dataSet(Sum of Deviation from actual)**Squared to remove negative

Gradient Descent:

Steps:

For feature index j= 0,1, repeat until convergence
Simultaneous compute Theta(0), Theta (1) and store in temp values
Simultaneous Update Theta(0), Theta (1)

Multi variant Linear Regression

Cost Function

  J(Theta0, Theta1, Theta.....Thetam) =  1/2m(Sum((predicted-actual)**2))              
                         =  1/2m(Sum((h(xi)-y(i))**2)) 
                         =  1/2number of dataSet(Sum of Deviation from actual)**Squared to remove negative

J(Theta) = 1/2m(Sum((predicted-actual)**2))

CostFunction

Gradient Descent:

Steps:

For feature index j= 0,1,....n repeat until convergence
Simultaneous compute Theta(0), Theta (1), .... Theta (n) and store in temp values
Simultaneous Update Theta(0), Theta (1) .... Theta (n)

Feature scaling

Make sure feature are on same scale other wise contour will be skew elliptical(2000/5). Gradient descent on skew Eclipse take long time to reach local minima

xi = (xi--min(X))/(max(X)-min(X))

Try to get feature into -1<= xi<= 1. Long gape will not fully scaled. Idellay should be withing range -3 to +3.

Feature scaling is way to avoid creating skew ellipse:

Feature scaling

Mean Normalization

Use mean and max range to normalize x values

xi = (xi-Avg(X))/(max(X)-min(X))

Mean Normalization

xi = (xi-Avg(X))/(max(X)-min(X))

Debugging gradient descent using Learning Rate

plot the cost function, J(θ) over the number of iterations of gradient descent. If J(θ) ever increases, then you probably need to decrease α.

If α is too small: slow convergence.
If α is too large: may not decrease on every iteration and thus may not converge.

Polynomial Regression

Sometimes by defining a new feature you might get a better Model that requires less computation

for example house price can defined by calculating area instead of creating 2 variable equation we can define one variable equation :

creatingFeature

Sometimes prediction fits Polynomial equation instead of linear equation
Scaling of feature becomes crucial in Polynomial Regression
Some algo can choose feature to fit polynomial curves

creatingFeature

Multivariate Linear Regression: Concepts and Implementation

Comprehensive guide to multivariate linear regression, covering multiple input features, model formulation, assumptions, cost function, gradient descent optimization, and evaluation techniques.

Written by Hitesh Sahu, a passionate developer and blogger.

Thu Feb 19 2026

Share This on

Multi Variant Linear Regression

Linear regression with multiple variables is also known as "multivariate linear regression".

Suppose y is function of multiple variables x then

X_column^(row) : X(i, j)

m = number of training example

n = number of feature in training example

x⁽ⁱ⁾ = i^th row in training example/ feature in example

x_j⁽ⁱ⁾ = i^th row , j^th column in training example

Linear Regression for Hypothesis for n features :

h_Θ(x) = Θ₀X0 +Θ₁X1 +Θ₁ X2+ Θ₁ X3+ ...Θ_nXn

Assume X0= 1 for convenience

h_Θ(x) = Θ₀X0 +Θ₁X1 +Θ₁ X2+ Θ₁ X3+ ...Θ_nXn

h_Θ(x) = Θ^TX

Inner Product of RowVector0(1n) Column VectorX(n1) = Scalar()11 Hypothesis

Notes:

Θ^T is a 1 by (n+1) matrix and not an (n+1) by 1 matrix
x₀⁽ⁱ⁾ =1 for matrix multiplication

Simple Linear Regression

Cost function:

CostFunction

   J(Theta0, Theta1) =  1/2m(Sum((predicted-actual)**2))              
                     =  1/2m(Sum((h(xi)-y(i))**2)) 
                     =  1/2number of dataSet(Sum of Deviation from actual)**Squared to remove negative

Gradient Descent:

Steps:

For feature index j= 0,1, repeat until convergence
Simultaneous compute Theta(0), Theta (1) and store in temp values
Simultaneous Update Theta(0), Theta (1)

Multi variant Linear Regression

Cost Function

  J(Theta0, Theta1, Theta.....Thetam) =  1/2m(Sum((predicted-actual)**2))              
                         =  1/2m(Sum((h(xi)-y(i))**2)) 
                         =  1/2number of dataSet(Sum of Deviation from actual)**Squared to remove negative

J(Theta) = 1/2m(Sum((predicted-actual)**2))

CostFunction

Gradient Descent:

Steps:

For feature index j= 0,1,....n repeat until convergence
Simultaneous compute Theta(0), Theta (1), .... Theta (n) and store in temp values
Simultaneous Update Theta(0), Theta (1) .... Theta (n)

Feature scaling

Make sure feature are on same scale other wise contour will be skew elliptical(2000/5). Gradient descent on skew Eclipse take long time to reach local minima

xi = (xi--min(X))/(max(X)-min(X))

Try to get feature into -1<= xi<= 1. Long gape will not fully scaled. Idellay should be withing range -3 to +3.

Feature scaling is way to avoid creating skew ellipse:

Feature scaling

Mean Normalization

Use mean and max range to normalize x values

xi = (xi-Avg(X))/(max(X)-min(X))

Mean Normalization

xi = (xi-Avg(X))/(max(X)-min(X))

Debugging gradient descent using Learning Rate

plot the cost function, J(θ) over the number of iterations of gradient descent. If J(θ) ever increases, then you probably need to decrease α.

If α is too small: slow convergence.
If α is too large: may not decrease on every iteration and thus may not converge.

Polynomial Regression

Sometimes by defining a new feature you might get a better Model that requires less computation

for example house price can defined by calculating area instead of creating 2 variable equation we can define one variable equation :

creatingFeature

Sometimes prediction fits Polynomial equation instead of linear equation
Scaling of feature becomes crucial in Polynomial Regression
Some algo can choose feature to fit polynomial curves

creatingFeature

Multivariate Linear Regression: Concepts and Implementation

Comprehensive guide to multivariate linear regression, covering multiple input features, model formulation, assumptions, cost function, gradient descent optimization, and evaluation techniques.

Written by Hitesh Sahu, a passionate developer and blogger.

Multi Variant Linear Regression

Linear regression with multiple variables is also known as "multivariate linear regression".

Xcolumn(row) : X(i, j)

hΘ(x) = ΘTX

Inner Product of RowVector0(1n) Column VectorX(n1) = Scalar()11 Hypothesis

Simple Linear Regression

Cost function:

Gradient Descent:

Multi variant Linear Regression

Cost Function

Gradient Descent:

Feature scaling

xi = (xi--min(X))/(max(X)-min(X))

Mean Normalization

xi = (xi-Avg(X))/(max(X)-min(X))

xi = (xi-Avg(X))/(max(X)-min(X))

Debugging gradient descent using Learning Rate

Polynomial Regression

Fetching content, this won’t take long…

🤯 Your stomach gets a new lining every 3–4 days.

Multivariate Linear Regression: Concepts and Implementation

Comprehensive guide to multivariate linear regression, covering multiple input features, model formulation, assumptions, cost function, gradient descent optimization, and evaluation techniques.

Written by Hitesh Sahu, a passionate developer and blogger.

Multi Variant Linear Regression

Linear regression with multiple variables is also known as "multivariate linear regression".

Xcolumn(row) : X(i, j)

hΘ(x) = ΘTX

Inner Product of RowVector0(1n) Column VectorX(n1) = Scalar()11 Hypothesis

Simple Linear Regression

Cost function:

Gradient Descent:

Multi variant Linear Regression

Cost Function

Gradient Descent:

Feature scaling

xi = (xi--min(X))/(max(X)-min(X))

Mean Normalization

xi = (xi-Avg(X))/(max(X)-min(X))

xi = (xi-Avg(X))/(max(X)-min(X))

Debugging gradient descent using Learning Rate

Polynomial Regression

X_column^(row) : X(i, j)

h_Θ(x) = Θ^TX

X_column^(row) : X(i, j)

h_Θ(x) = Θ^TX