Collaborative Filtering: Building Recommender Systems with Feature Learning

Learn how collaborative filtering powers modern recommender systems by simultaneously learning user preferences and item features from rating data. Understand the optimization objective, matrix factorization approach, and how gradient-based methods enable scalable recommendations.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Recommender Systems: Collaborative Filtering, Content-Based Filtering, and Hybrid Approaches

Large Scale Machine Learning: Training Models on Massive Datasets

Collaborative Filtering

Recommender system technique that learns both user preferences and item features automatically from rating data.

Unlike content-based methods, we do not know the features of movies beforehand.

Collaborative filtering learns hidden features of users and items so it can predict missing ratings and recommend things people will likely enjoy.

Why It’s Called Collaborative

Many users rate movies.

Their ratings collaboratively help the system learn features.

Result:

Better movie representations
Better recommendations for everyone

Key Idea

People with similar tastes tend to like similar things.

Collaborative filtering simultaneously learns:

user preferences $\theta$
item features $x$

directly from the rating matrix, without manually defining features.

Movies feature Matrix $x^{(i)}$

Movie	Romance	Action
Titanic	0.9	0.1
Notebook	0.95	0.05
Avengers	0.1	0.9
John Wick	0.05	0.95

$x^{(1)} = \begin{bmatrix} 0.9 \\ 0.1 \end{bmatrix}$

where

$x_1$ = romantic level
$x_2$ = action level

From this we infer $x^{(1)}$ :

Movie is romantic
Movie is not action

No Intercept Term

Unlike previous models:

We remove the intercept feature $x_0 = 1$ .

x^{(i)} \in \mathbb{R}^n

\theta^{(j)} \in \mathbb{R}^n

Reason: since the algorithm learns all features automatically, it can learn a constant feature itself if needed.

User Movie Rating Matrix

User	Titanic	The Notebook	Avengers	John Wick
Alice	⭐⭐⭐⭐⭐	⭐	⭐	⭐
Bob	⭐⭐⭐⭐	`?`	⭐	⭐
Carol	⭐	⭐	⭐	⭐⭐⭐⭐⭐

Prediction of user $j$ rating movie $i$ :

\hat{y}_{ij} = \theta^{(j)T} x^{(i)}

User Preferences Matrix $\theta^{(j)}$

User	Likes Romance	Likes Action
Alice	`0.95`	`0.05`
Bob	`0.85`	`0.15`
Carol	`0.05`	`0.95`

These features are not manually defined.
The algorithm learns them from ratings.

Observation

Alice and Bob have similar taste
Both dislike action movies
Both like romantic movies

So if Bob has not rated The Notebook, we can predict:

Bob will probably rate it highly.

The algorithm uses behavior of other users to predict what someone will like.

Learning Movie Features

If user parameters $\theta^{(j)}$ are known, we can learn movie features $x^{(i)}$ .

Minimize prediction error:

\min_{x^{(i)}} \sum_{j:r(i,j)=1} \left(\theta^{(j)T}x^{(i)} - y_{ij}\right)^2+ \frac{\lambda}{2}\sum_{k=1}^{n}(x_k^{(i)})^2

Where:

$y_{ij}$ = actual rating
$r(i,j)=1$ if rating exists
$\lambda$ = regularization

Learning All Movie Features

For all movies:

\min_{x^{(1)},...,x^{(n_m)}} \sum_{i=1}^{n_m} \sum_{j:r(i,j)=1} \left(\theta^{(j)T}x^{(i)} - y_{ij}\right)^2+ \frac{\lambda}{2}\sum_{i=1}^{n_m}\sum_{k=1}^{n}(x_k^{(i)})^2

Collaborative Filtering Algorithm

Chicken-and-Egg Problem

Previously we saw two ideas:

If movie features $x^{(i)}$ are known, we can learn user parameters $\theta^{(j)}$ .
If user parameters $\theta^{(j)}$ are known, we can learn movie features $x^{(i)}$ .

Instead of alternating between them, collaborative filtering learns both simultaneously.

1. Initialize $\theta$ randomly.

Initialize with small random values:

x^{(i)}, \theta^{(j)}

2. Minimize the cost function $J(x,\theta)$

Estimate movie features: Fix $\theta$ , learn $x$
Estimate user preferences: Fix $x$ , learn $\theta$
Repeat until convergence.

If we:

fix $x$ and minimize $J$ w.r.t. $\theta$ , we recover the user learning problem.
fix $\theta$ and minimize $J$ w.r.t. $x$ , we recover the movie feature learning problem.

Instead of alternating between them, we optimize both together.

Minimize cost with:

Gradient Descent
Advanced optimizers (e.g., Conjugate Gradient, L-BFGS)

J(x,\theta)

We combine both learning problems into a single cost function.

J(x,\theta)= \frac{1}{2} \sum_{(i,j):r(i,j)=1} (\theta^{(j)T}x^{(i)} - y^{(i,j)})^2+ \frac{\lambda}{2} \sum_{j=1}^{n_u}\sum_{k=1}^{n}(\theta_k^{(j)})^2+ \frac{\lambda}{2} \sum_{i=1}^{n_m}\sum_{k=1}^{n}(x_k^{(i)})^2

Where:

$y^{(i,j)}$ = rating user $j$ gave movie $i$
$r(i,j)=1$ if rating exists, otherwise $0$
$x^{(i)}$ = feature vector for movie $i$
$\theta^{(j)}$ = parameter vector for user $j$

This objective:

penalizes prediction error
regularizes user parameters
regularizes movie features

3. Rating Prediction

Once the model is trained, predicted rating:

\hat{y}^{(i,j)} = \theta^{(j)T}x^{(i)}

If user $j$ has not rated movie $i$ , we predict their rating using this value.

4. Result

The algorithm learns:

movie feature vectors $x^{(i)}$
user preference vectors $\theta^{(j)}$

from the rating matrix alone, without manually defining movie features.

Collaborative Filtering: Building Recommender Systems with Feature Learning

Learn how collaborative filtering powers modern recommender systems by simultaneously learning user preferences and item features from rating data. Understand the optimization objective, matrix factorization approach, and how gradient-based methods enable scalable recommendations.

Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Recommender Systems: Collaborative Filtering, Content-Based Filtering, and Hybrid Approaches

Large Scale Machine Learning: Training Models on Massive Datasets

Collaborative Filtering

Recommender system technique that learns both user preferences and item features automatically from rating data.

Unlike content-based methods, we do not know the features of movies beforehand.

Collaborative filtering learns hidden features of users and items so it can predict missing ratings and recommend things people will likely enjoy.

Why It’s Called Collaborative

Many users rate movies.

Their ratings collaboratively help the system learn features.

Result:

Better movie representations
Better recommendations for everyone

Key Idea

People with similar tastes tend to like similar things.

Collaborative filtering simultaneously learns:

user preferences $\theta$
item features $x$

directly from the rating matrix, without manually defining features.

Movies feature Matrix $x^{(i)}$

Movie	Romance	Action
Titanic	0.9	0.1
Notebook	0.95	0.05
Avengers	0.1	0.9
John Wick	0.05	0.95

$x^{(1)} = \begin{bmatrix} 0.9 \\ 0.1 \end{bmatrix}$

where

$x_1$ = romantic level
$x_2$ = action level

From this we infer $x^{(1)}$ :

Movie is romantic
Movie is not action

No Intercept Term

Unlike previous models:

We remove the intercept feature $x_0 = 1$ .

x^{(i)} \in \mathbb{R}^n

\theta^{(j)} \in \mathbb{R}^n

Reason: since the algorithm learns all features automatically, it can learn a constant feature itself if needed.

User Movie Rating Matrix

User	Titanic	The Notebook	Avengers	John Wick
Alice	⭐⭐⭐⭐⭐	⭐	⭐	⭐
Bob	⭐⭐⭐⭐	`?`	⭐	⭐
Carol	⭐	⭐	⭐	⭐⭐⭐⭐⭐

Prediction of user $j$ rating movie $i$ :

\hat{y}_{ij} = \theta^{(j)T} x^{(i)}

User Preferences Matrix $\theta^{(j)}$

User	Likes Romance	Likes Action
Alice	`0.95`	`0.05`
Bob	`0.85`	`0.15`
Carol	`0.05`	`0.95`

These features are not manually defined.
The algorithm learns them from ratings.

Observation

Alice and Bob have similar taste
Both dislike action movies
Both like romantic movies

So if Bob has not rated The Notebook, we can predict:

Bob will probably rate it highly.

The algorithm uses behavior of other users to predict what someone will like.

Learning Movie Features

If user parameters $\theta^{(j)}$ are known, we can learn movie features $x^{(i)}$ .

Minimize prediction error:

\min_{x^{(i)}} \sum_{j:r(i,j)=1} \left(\theta^{(j)T}x^{(i)} - y_{ij}\right)^2+ \frac{\lambda}{2}\sum_{k=1}^{n}(x_k^{(i)})^2

Where:

$y_{ij}$ = actual rating
$r(i,j)=1$ if rating exists
$\lambda$ = regularization

Learning All Movie Features

For all movies:

\min_{x^{(1)},...,x^{(n_m)}} \sum_{i=1}^{n_m} \sum_{j:r(i,j)=1} \left(\theta^{(j)T}x^{(i)} - y_{ij}\right)^2+ \frac{\lambda}{2}\sum_{i=1}^{n_m}\sum_{k=1}^{n}(x_k^{(i)})^2

Collaborative Filtering Algorithm

Chicken-and-Egg Problem

Previously we saw two ideas:

If movie features $x^{(i)}$ are known, we can learn user parameters $\theta^{(j)}$ .
If user parameters $\theta^{(j)}$ are known, we can learn movie features $x^{(i)}$ .

Instead of alternating between them, collaborative filtering learns both simultaneously.

1. Initialize $\theta$ randomly.

Initialize with small random values:

x^{(i)}, \theta^{(j)}

2. Minimize the cost function $J(x,\theta)$

Estimate movie features: Fix $\theta$ , learn $x$
Estimate user preferences: Fix $x$ , learn $\theta$
Repeat until convergence.

If we:

fix $x$ and minimize $J$ w.r.t. $\theta$ , we recover the user learning problem.
fix $\theta$ and minimize $J$ w.r.t. $x$ , we recover the movie feature learning problem.

Instead of alternating between them, we optimize both together.

Minimize cost with:

Gradient Descent
Advanced optimizers (e.g., Conjugate Gradient, L-BFGS)

J(x,\theta)

We combine both learning problems into a single cost function.

J(x,\theta)= \frac{1}{2} \sum_{(i,j):r(i,j)=1} (\theta^{(j)T}x^{(i)} - y^{(i,j)})^2+ \frac{\lambda}{2} \sum_{j=1}^{n_u}\sum_{k=1}^{n}(\theta_k^{(j)})^2+ \frac{\lambda}{2} \sum_{i=1}^{n_m}\sum_{k=1}^{n}(x_k^{(i)})^2

Where:

$y^{(i,j)}$ = rating user $j$ gave movie $i$
$r(i,j)=1$ if rating exists, otherwise $0$
$x^{(i)}$ = feature vector for movie $i$
$\theta^{(j)}$ = parameter vector for user $j$

This objective:

penalizes prediction error
regularizes user parameters
regularizes movie features

3. Rating Prediction

Once the model is trained, predicted rating:

\hat{y}^{(i,j)} = \theta^{(j)T}x^{(i)}

If user $j$ has not rated movie $i$ , we predict their rating using this value.

4. Result

The algorithm learns:

movie feature vectors $x^{(i)}$
user preference vectors $\theta^{(j)}$

from the rating matrix alone, without manually defining movie features.

Collaborative Filtering: Building Recommender Systems with Feature Learning

Learn how collaborative filtering powers modern recommender systems by simultaneously learning user preferences and item features from rating data. Understand the optimization objective, matrix factorization approach, and how gradient-based methods enable scalable recommendations.

Written by Hitesh Sahu, a passionate developer and blogger.

Collaborative Filtering

Why It’s Called Collaborative

Key Idea

Movies feature Matrix $x^{(i)}$

No Intercept Term

User Movie Rating Matrix

User Preferences Matrix $\theta^{(j)}$

Learning Movie Features

Learning All Movie Features

Collaborative Filtering Algorithm

Chicken-and-Egg Problem

1. Initialize $\theta$ randomly.

2. Minimize the cost function $J(x,\theta)$

3. Rating Prediction

4. Result

Playstore

Fetching content, this won’t take long…

🤯 Your stomach gets a new lining every 3–4 days.

Collaborative Filtering: Building Recommender Systems with Feature Learning

Learn how collaborative filtering powers modern recommender systems by simultaneously learning user preferences and item features from rating data. Understand the optimization objective, matrix factorization approach, and how gradient-based methods enable scalable recommendations.

Written by Hitesh Sahu, a passionate developer and blogger.

Collaborative Filtering

Why It’s Called Collaborative

Key Idea

Movies feature Matrix $x^{(i)}$

No Intercept Term

User Movie Rating Matrix

User Preferences Matrix $\theta^{(j)}$

Learning Movie Features

Learning All Movie Features

Collaborative Filtering Algorithm

Chicken-and-Egg Problem

1. Initialize $\theta$ randomly.

2. Minimize the cost function $J(x,\theta)$

3. Rating Prediction

4. Result

Playstore

Collaborative Filtering: Building Recommender Systems with Feature Learning

Learn how collaborative filtering powers modern recommender systems by simultaneously learning user preferences and item features from rating data. Understand the optimization objective, matrix factorization approach, and how gradient-based methods enable scalable recommendations.

Written by Hitesh Sahu, a passionate developer and blogger.

Collaborative Filtering

Why It’s Called Collaborative

Key Idea

Movies feature Matrix x(i)x^{(i)}x(i)

No Intercept Term

User Movie Rating Matrix

User Preferences Matrix θ(j)\theta^{(j)}θ(j)

Learning Movie Features

Learning All Movie Features

Collaborative Filtering Algorithm

Chicken-and-Egg Problem

1. Initialize θ\thetaθ randomly.

2. Minimize the cost function J(x,θ)J(x,\theta)J(x,θ)

3. Rating Prediction

4. Result

Fetching content, this won’t take long…

🤯 Your stomach gets a new lining every 3–4 days.

Collaborative Filtering: Building Recommender Systems with Feature Learning

Learn how collaborative filtering powers modern recommender systems by simultaneously learning user preferences and item features from rating data. Understand the optimization objective, matrix factorization approach, and how gradient-based methods enable scalable recommendations.

Written by Hitesh Sahu, a passionate developer and blogger.

Collaborative Filtering

Why It’s Called Collaborative

Key Idea

Movies feature Matrix x(i)x^{(i)}x(i)

No Intercept Term

User Movie Rating Matrix

User Preferences Matrix θ(j)\theta^{(j)}θ(j)

Learning Movie Features

Learning All Movie Features

Collaborative Filtering Algorithm

Chicken-and-Egg Problem

1. Initialize θ\thetaθ randomly.

2. Minimize the cost function J(x,θ)J(x,\theta)J(x,θ)

3. Rating Prediction

4. Result

Movies feature Matrix $x^{(i)}$

User Preferences Matrix $\theta^{(j)}$

1. Initialize $\theta$ randomly.

2. Minimize the cost function $J(x,\theta)$

Movies feature Matrix $x^{(i)}$

User Preferences Matrix $\theta^{(j)}$

1. Initialize $\theta$ randomly.

2. Minimize the cost function $J(x,\theta)$