Hitesh Sahu
Hitesh SahuHitesh Sahu
  1. Home
  2. ›
  3. posts
  4. ›
  5. …

  6. ›
  7. 13 Collabrative FIltering

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🐙 Octopuses have three hearts and blue blood.

🍪 This website uses cookies

No personal data is stored on our servers however third party tools Google Analytics cookies to measure traffic and improve your website experience. Learn more

Cover Image for Collaborative Filtering: Building Recommender Systems with Feature Learning

Collaborative Filtering: Building Recommender Systems with Feature Learning

Learn how collaborative filtering powers modern recommender systems by simultaneously learning user preferences and item features from rating data. Understand the optimization objective, matrix factorization approach, and how gradient-based methods enable scalable recommendations.

Hitesh Sahu
Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Recommender Systems: Collaborative Filtering, Content-Based Filtering, and Hybrid Approaches

Next →

Large Scale Machine Learning: Training Models on Massive Datasets

Collaborative Filtering

Recommender system technique that learns both user preferences and item features automatically from rating data.

Unlike content-based methods, we do not know the features of movies beforehand.

Collaborative filtering learns hidden features of users and items so it can predict missing ratings and recommend things people will likely enjoy.

Why It’s Called Collaborative

Many users rate movies.

Their ratings collaboratively help the system learn features.

Result:

  • Better movie representations
  • Better recommendations for everyone

Key Idea

People with similar tastes tend to like similar things.

Collaborative filtering simultaneously learns:

  • user preferences θ\thetaθ
  • item features xxx

directly from the rating matrix, without manually defining features.

Movies feature Matrix x(i)x^{(i)}x(i)

Movie Romance Action
Titanic 0.9 0.1
Notebook 0.95 0.05
Avengers 0.1 0.9
John Wick 0.05 0.95

x(1)=[0.90.1]x^{(1)} = \begin{bmatrix} 0.9 \\ 0.1 \end{bmatrix}x(1)=[0.90.1​]

where

  • x1x_1x1​ = romantic level
  • x2x_2x2​ = action level

From this we infer x(1)x^{(1)}x(1):

  • Movie is romantic
  • Movie is not action

No Intercept Term

Unlike previous models:

  • We remove the intercept feature x0=1x_0 = 1x0​=1.
x(i)∈Rnx^{(i)} \in \mathbb{R}^nx(i)∈Rn θ(j)∈Rn\theta^{(j)} \in \mathbb{R}^nθ(j)∈Rn

Reason: since the algorithm learns all features automatically, it can learn a constant feature itself if needed.

User Movie Rating Matrix

User Titanic The Notebook Avengers John Wick
Alice ⭐⭐⭐⭐⭐ ⭐ ⭐ ⭐
Bob ⭐⭐⭐⭐ ? ⭐ ⭐
Carol ⭐ ⭐ ⭐ ⭐⭐⭐⭐⭐

Prediction of user jjj rating movie iii:

y^ij=θ(j)Tx(i)\hat{y}_{ij} = \theta^{(j)T} x^{(i)}y^​ij​=θ(j)Tx(i)

User Preferences Matrix θ(j)\theta^{(j)}θ(j)

User Likes Romance Likes Action
Alice 0.95 0.05
Bob 0.85 0.15
Carol 0.05 0.95
  • These features are not manually defined.
  • The algorithm learns them from ratings.

Observation

  • Alice and Bob have similar taste
  • Both dislike action movies
  • Both like romantic movies

So if Bob has not rated The Notebook, we can predict:

  • Bob will probably rate it highly.

The algorithm uses behavior of other users to predict what someone will like.

Learning Movie Features

If user parameters θ(j)\theta^{(j)}θ(j) are known, we can learn movie features x(i)x^{(i)}x(i).

Minimize prediction error:

min⁡x(i)∑j:r(i,j)=1(θ(j)Tx(i)−yij)2+λ2∑k=1n(xk(i))2\min_{x^{(i)}} \sum_{j:r(i,j)=1} \left(\theta^{(j)T}x^{(i)} - y_{ij}\right)^2+ \frac{\lambda}{2}\sum_{k=1}^{n}(x_k^{(i)})^2x(i)min​j:r(i,j)=1∑​(θ(j)Tx(i)−yij​)2+2λ​k=1∑n​(xk(i)​)2

Where:

  • yijy_{ij}yij​ = actual rating
  • r(i,j)=1r(i,j)=1r(i,j)=1 if rating exists
  • λ\lambdaλ = regularization

Learning All Movie Features

For all movies:

min⁡x(1),...,x(nm)∑i=1nm∑j:r(i,j)=1(θ(j)Tx(i)−yij)2+λ2∑i=1nm∑k=1n(xk(i))2\min_{x^{(1)},...,x^{(n_m)}} \sum_{i=1}^{n_m} \sum_{j:r(i,j)=1} \left(\theta^{(j)T}x^{(i)} - y_{ij}\right)^2+ \frac{\lambda}{2}\sum_{i=1}^{n_m}\sum_{k=1}^{n}(x_k^{(i)})^2x(1),...,x(nm​)min​i=1∑nm​​j:r(i,j)=1∑​(θ(j)Tx(i)−yij​)2+2λ​i=1∑nm​​k=1∑n​(xk(i)​)2

Collaborative Filtering Algorithm

Chicken-and-Egg Problem

Previously we saw two ideas:

  1. If movie features x(i)x^{(i)}x(i) are known, we can learn user parameters θ(j)\theta^{(j)}θ(j).
  2. If user parameters θ(j)\theta^{(j)}θ(j) are known, we can learn movie features x(i)x^{(i)}x(i).

Instead of alternating between them, collaborative filtering learns both simultaneously.

1. Initialize θ\thetaθ randomly.

Initialize with small random values:

x(i),θ(j)x^{(i)}, \theta^{(j)}x(i),θ(j)

2. Minimize the cost function J(x,θ)J(x,\theta)J(x,θ)

  • Estimate movie features: Fix θ\thetaθ, learn xxx
  • Estimate user preferences: Fix xxx, learn θ\thetaθ
  • Repeat until convergence.

If we:

  • fix xxx and minimize JJJ w.r.t. θ\thetaθ, we recover the user learning problem.
  • fix θ\thetaθ and minimize JJJ w.r.t. xxx, we recover the movie feature learning problem.

Instead of alternating between them, we optimize both together.

Minimize cost with:

  • Gradient Descent
  • Advanced optimizers (e.g., Conjugate Gradient, L-BFGS)
J(x,θ)J(x,\theta)J(x,θ)

We combine both learning problems into a single cost function.

J(x,θ)=12∑(i,j):r(i,j)=1(θ(j)Tx(i)−y(i,j))2+λ2∑j=1nu∑k=1n(θk(j))2+λ2∑i=1nm∑k=1n(xk(i))2J(x,\theta)= \frac{1}{2} \sum_{(i,j):r(i,j)=1} (\theta^{(j)T}x^{(i)} - y^{(i,j)})^2+ \frac{\lambda}{2} \sum_{j=1}^{n_u}\sum_{k=1}^{n}(\theta_k^{(j)})^2+ \frac{\lambda}{2} \sum_{i=1}^{n_m}\sum_{k=1}^{n}(x_k^{(i)})^2J(x,θ)=21​(i,j):r(i,j)=1∑​(θ(j)Tx(i)−y(i,j))2+2λ​j=1∑nu​​k=1∑n​(θk(j)​)2+2λ​i=1∑nm​​k=1∑n​(xk(i)​)2

Where:

  • y(i,j)y^{(i,j)}y(i,j) = rating user jjj gave movie iii
  • r(i,j)=1r(i,j)=1r(i,j)=1 if rating exists, otherwise 000
  • x(i)x^{(i)}x(i) = feature vector for movie iii
  • θ(j)\theta^{(j)}θ(j) = parameter vector for user jjj

This objective:

  • penalizes prediction error
  • regularizes user parameters
  • regularizes movie features

3. Rating Prediction

Once the model is trained, predicted rating:

y^(i,j)=θ(j)Tx(i)\hat{y}^{(i,j)} = \theta^{(j)T}x^{(i)}y^​(i,j)=θ(j)Tx(i)

If user jjj has not rated movie iii, we predict their rating using this value.

4. Result

The algorithm learns:

  • movie feature vectors x(i)x^{(i)}x(i)
  • user preference vectors θ(j)\theta^{(j)}θ(j)

from the rating matrix alone, without manually defining movie features.


AI-Machine-Learning/13-Collabrative-FIltering
Let's work together
+49 176-2019-2523
hiteshkrsahu@gmail.com
WhatsApp
Skype
Munich 🥨, Germany 🇩🇪, EU
Playstore
Hitesh Sahu's apps on Google Play Store
Need Help?
Let's Connect
Navigation
  Home/About
  Skills
  Work/Projects
  Lab/Experiments
  Contribution
  Awards
  Art/Sketches
  Thoughts
  Contact
Links
  Sitemap
  Legal Notice
  Privacy Policy

Made with

NextJS logo

NextJS by

hitesh Sahu

| © 2026 All rights reserved.