Recommender Systems: Collaborative Filtering, Content-Based Filtering, and Hybrid Approaches

Comprehensive guide to recommender systems, covering collaborative filtering, content-based filtering, and hybrid approaches, with practical implementation examples and best practices for building effective recommendation engines.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Anomaly Detection Using Gaussian Distribution: Detecting Outliers with Probability Models

Collaborative Filtering: Building Recommender Systems with Feature Learning

Recommender Systems

A recommender system predicts how users would rate items they have not yet rated.

Example:

$n_u$ = number of users
$n_m$ = number of movies

Users rate some movies (1–5 stars), but many ratings are missing.
Goal: predict the missing ratings.

Content-Based Idea

Each movie has features describing its content.

Example features:

$x_1$ = romance level
$x_2$ = action level

Example movie features:

Movie	Romance ( $x_1$ )	Action ( $x_2$ )
Love at Last	0.9	0
Romance Forever	1.0	0.01
Swords vs Karate	0	0.9
Nonstop Car Chases	0.1	1.0
Cute Puppies of Love	0.99	0

Each movie is represented by a feature vector:

x^{(i)} = \begin{bmatrix} 1 \\ x_1 \\ x_2 \end{bmatrix}

The first element is the bias feature:

x_0 = 1

If we have $n$ features, then:

x^{(i)} \in \mathbb{R}^{n+1}

User Preference Model

Each user $j$ has their own parameter vector:

\theta^{(j)} \in \mathbb{R}^{n+1}

It represents the user's preferences for features.

Example:

Alice likes romance → high weight on $x_1$
Bob likes action → high weight on $x_2$

Rating Prediction

Predicted rating of user $j$ for movie $i$ :

\hat{y}^{(i,j)} = (\theta^{(j)})^T x^{(i)}

This is just linear regression.

Example

Movie: Cute Puppies of Love

Feature vector:

x = \begin{bmatrix} 1 \\ 0.99 \\ 0 \end{bmatrix}

Alice's preference vector:

\theta^{(1)} = \begin{bmatrix} 0 \\ 5 \\ 0 \end{bmatrix}

Prediction:

\hat{y} = (\theta^{(1)})^T x

Result:

\hat{y} = 5 \times 0.99 = 4.95

Predicted rating ≈ 5 stars.

Cost Function

Single User

Let

$j$ = user index
$i$ = movie/ item index

Rating

$r(i,j)=1$ if user $j$ rated on movie $i$ , default = $0$
$y^{(i,j)}$ = rating by user $j$ on movie $i$ (if defined)

$\theta^{(j)}$

parameter vector for user j

kind of movies liked by user $j$

Predicted Rating

For use $j$ , movie $i$ predicted rating

$(\theta^{(j)})^T x^{(i)}$

For user $j$ , minimize squared error:

Where:

$m^{(j)}$ = number of movies rated by user $j$

J(\theta^{(j)}) = \frac{1}{2} \sum_{i:r(i,j)=1} \left((\theta^{(j)})^T x^{(i)} - y^{(i,j)}\right)^2+ \frac{\lambda}{2} \sum_{k=1}^{n}(\theta_k^{(j)})^2

Second term = regularization (prevents overfitting).

All Users

We learn parameters for all users:

J(\theta^{(1)},...,\theta^{(n_u)}) = \frac{1}{2} \sum_{j=1}^{n_u} \sum_{i:r(i,j)=1} \left((\theta^{(j)})^T x^{(i)} - y^{(i,j)}\right)^2+ \frac{\lambda}{2} \sum_{j=1}^{n_u}\sum_{k=1}^{n}(\theta_k^{(j)})^2

Minimize this to learn all user preferences.

Optimization

Parameters are learned using:

Gradient Descent
or advanced optimizers (LBFGS, Conjugate Gradient)

Updates look similar to linear regression updates.

Why It's Called Content-Based

Because predictions depend on item features:

romance
action
genre
actors
etc.

The system recommends items whose content matches the user's preferences.

Limitation

Requires hand-crafted features for items.

In many real systems:

features are missing
hard to define

This leads to the next method:

Collaborative Filtering.

Recommender Systems: Collaborative Filtering, Content-Based Filtering, and Hybrid Approaches

Comprehensive guide to recommender systems, covering collaborative filtering, content-based filtering, and hybrid approaches, with practical implementation examples and best practices for building effective recommendation engines.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Anomaly Detection Using Gaussian Distribution: Detecting Outliers with Probability Models

Collaborative Filtering: Building Recommender Systems with Feature Learning

Recommender Systems

A recommender system predicts how users would rate items they have not yet rated.

Example:

$n_u$ = number of users
$n_m$ = number of movies

Users rate some movies (1–5 stars), but many ratings are missing.
Goal: predict the missing ratings.

Content-Based Idea

Each movie has features describing its content.

Example features:

$x_1$ = romance level
$x_2$ = action level

Example movie features:

Movie	Romance ( $x_1$ )	Action ( $x_2$ )
Love at Last	0.9	0
Romance Forever	1.0	0.01
Swords vs Karate	0	0.9
Nonstop Car Chases	0.1	1.0
Cute Puppies of Love	0.99	0

Each movie is represented by a feature vector:

x^{(i)} = \begin{bmatrix} 1 \\ x_1 \\ x_2 \end{bmatrix}

The first element is the bias feature:

x_0 = 1

If we have $n$ features, then:

x^{(i)} \in \mathbb{R}^{n+1}

User Preference Model

Each user $j$ has their own parameter vector:

\theta^{(j)} \in \mathbb{R}^{n+1}

It represents the user's preferences for features.

Example:

Alice likes romance → high weight on $x_1$
Bob likes action → high weight on $x_2$

Rating Prediction

Predicted rating of user $j$ for movie $i$ :

\hat{y}^{(i,j)} = (\theta^{(j)})^T x^{(i)}

This is just linear regression.

Example

Movie: Cute Puppies of Love

Feature vector:

x = \begin{bmatrix} 1 \\ 0.99 \\ 0 \end{bmatrix}

Alice's preference vector:

\theta^{(1)} = \begin{bmatrix} 0 \\ 5 \\ 0 \end{bmatrix}

Prediction:

\hat{y} = (\theta^{(1)})^T x

Result:

\hat{y} = 5 \times 0.99 = 4.95

Predicted rating ≈ 5 stars.

Cost Function

Single User

Let

$j$ = user index
$i$ = movie/ item index

Rating

$r(i,j)=1$ if user $j$ rated on movie $i$ , default = $0$
$y^{(i,j)}$ = rating by user $j$ on movie $i$ (if defined)

$\theta^{(j)}$

parameter vector for user j

kind of movies liked by user $j$

Predicted Rating

For use $j$ , movie $i$ predicted rating

$(\theta^{(j)})^T x^{(i)}$

For user $j$ , minimize squared error:

Where:

$m^{(j)}$ = number of movies rated by user $j$

J(\theta^{(j)}) = \frac{1}{2} \sum_{i:r(i,j)=1} \left((\theta^{(j)})^T x^{(i)} - y^{(i,j)}\right)^2+ \frac{\lambda}{2} \sum_{k=1}^{n}(\theta_k^{(j)})^2

Second term = regularization (prevents overfitting).

All Users

We learn parameters for all users:

J(\theta^{(1)},...,\theta^{(n_u)}) = \frac{1}{2} \sum_{j=1}^{n_u} \sum_{i:r(i,j)=1} \left((\theta^{(j)})^T x^{(i)} - y^{(i,j)}\right)^2+ \frac{\lambda}{2} \sum_{j=1}^{n_u}\sum_{k=1}^{n}(\theta_k^{(j)})^2

Minimize this to learn all user preferences.

Optimization

Parameters are learned using:

Gradient Descent
or advanced optimizers (LBFGS, Conjugate Gradient)

Updates look similar to linear regression updates.

Why It's Called Content-Based

Because predictions depend on item features:

romance
action
genre
actors
etc.

The system recommends items whose content matches the user's preferences.

Limitation

Requires hand-crafted features for items.

In many real systems:

features are missing
hard to define

This leads to the next method:

Collaborative Filtering.

Recommender Systems: Collaborative Filtering, Content-Based Filtering, and Hybrid Approaches

Comprehensive guide to recommender systems, covering collaborative filtering, content-based filtering, and hybrid approaches, with practical implementation examples and best practices for building effective recommendation engines.

Written by Hitesh Sahu, a passionate developer and blogger.

Recommender Systems

Content-Based Idea

User Preference Model

Rating Prediction

Example

Cost Function

Single User

$\theta^{(j)}$

Predicted Rating

All Users

Optimization

Why It's Called Content-Based

Limitation

Playstore

Fetching content, this won’t take long…

🍌 Bananas are berries, but strawberries are not.

Recommender Systems: Collaborative Filtering, Content-Based Filtering, and Hybrid Approaches

Comprehensive guide to recommender systems, covering collaborative filtering, content-based filtering, and hybrid approaches, with practical implementation examples and best practices for building effective recommendation engines.

Written by Hitesh Sahu, a passionate developer and blogger.

Recommender Systems

Content-Based Idea

User Preference Model

Rating Prediction

Example

Cost Function

Single User

$\theta^{(j)}$

Predicted Rating

All Users

Optimization

Why It's Called Content-Based

Limitation

Playstore

Recommender Systems: Collaborative Filtering, Content-Based Filtering, and Hybrid Approaches

Comprehensive guide to recommender systems, covering collaborative filtering, content-based filtering, and hybrid approaches, with practical implementation examples and best practices for building effective recommendation engines.

Written by Hitesh Sahu, a passionate developer and blogger.

Recommender Systems

Content-Based Idea

User Preference Model

Rating Prediction

Example

Cost Function

Single User

θ(j)\theta^{(j)}θ(j)

Predicted Rating

All Users

Optimization

Why It's Called Content-Based

Limitation

Fetching content, this won’t take long…

🍌 Bananas are berries, but strawberries are not.

Recommender Systems: Collaborative Filtering, Content-Based Filtering, and Hybrid Approaches

Comprehensive guide to recommender systems, covering collaborative filtering, content-based filtering, and hybrid approaches, with practical implementation examples and best practices for building effective recommendation engines.

Written by Hitesh Sahu, a passionate developer and blogger.

Recommender Systems

Content-Based Idea

User Preference Model

Rating Prediction

Example

Cost Function

Single User

θ(j)\theta^{(j)}θ(j)

Predicted Rating

All Users

Optimization

Why It's Called Content-Based

Limitation

$\theta^{(j)}$

$\theta^{(j)}$