Hitesh Sahu
Hitesh SahuHitesh Sahu
  1. Home
  2. ›
  3. posts
  4. ›
  5. …

  6. ›
  7. 7 1 Collaborative Filtering

Loading ⏳
Fetching content, this won’t take long…


💡 Did you know?

🤯 Your stomach gets a new lining every 3–4 days.

🍪 This website uses cookies

No personal data is stored on our servers however third party tools Google Analytics cookies to measure traffic and improve your website experience. Learn more

AI-Machine-Learning

  • AI-Machine-Learning Index

  • Machine Learning Learning Path

  • Machine Learning: Introduction and Core Algorithms

  • Linear Regression Explained: Single Variable and Multivariate Models with Gradient Descent

  • Evaluating a Hypothesis in Neural Networks

  • Bias-Variance Dilemma

  • Cost Function Regularization: Balancing Bias and Variance in Machine Learning Models

  • Polynomial Regression

  • Normal Equation in Linear Regression: Formula, Intuition, and Comparison with Gradient Descent

  • Logistic Regression for Classification: Concept, Sigmoid Function, Cost Function, and Implementation

  • Logistic Regression for Classification: Concept, Sigmoid Function, Cost Function, and Implementation

  • Support Vector Machines (SVM): Maximizing Margins for Robust Machine Learning Models

  • XGBoost (Extreme Gradient Boosting) Explained

  • Dimensionality Reduction in Machine Learning

  • Principal Component Analysis (PCA) Explained

  • t-SNE (t-distributed Stochastic Neighbor Embedding) Explained

  • K-Means Clustering

  • Anomaly Detection: Identifying Rare and Unusual Patterns in Data

  • Anomaly Detection Using Gaussian Distribution in Machine Learning

  • Anomaly Detection Using Multivariate Gaussian Distribution

  • Recommender Systems: Collaborative Filtering, Content-Based Filtering, and Hybrid Approaches

  • Collaborative Filtering: Building Recommender Systems with Feature Learning

  • Anomaly Detection: Identifying Rare and Unusual Patterns in Data

  • Large Scale Machine Learning: Training Models on Massive Datasets

  • Stochastic Gradient Descent (SGD): Efficient Optimization for Large Datasets

  • MapReduce for Large-Scale Machine Learning: Distributed Training at Scale

Cover Image for Collaborative Filtering: Building Recommender Systems with Feature Learning

Collaborative Filtering: Building Recommender Systems with Feature Learning

Learn how collaborative filtering powers modern recommender systems by simultaneously learning user preferences and item features from rating data. Understand the optimization objective, matrix factorization approach, and how gradient-based methods enable scalable recommendations.

Hitesh Sahu
Written by Hitesh Sahu, a passionate developer and blogger.

Fri Feb 27 2026

Share This on

← Previous

Recommender Systems: Collaborative Filtering, Content-Based Filtering, and Hybrid Approaches

Next →

Anomaly Detection: Identifying Rare and Unusual Patterns in Data

Collaborative Filtering 🫱🏻‍🫲🏽

Recommender system technique that learns both user preferences and item features automatically from rating data.

  • Unlike content-based methods, we do not know the features of movies beforehand.

  • Collaborative filtering learns hidden features of users and items so it can predict missing ratings and recommend things people will likely enjoy.

Chicken-and-Egg Problem 🥚

Previously we saw two ideas:

  1. If movie features x(i)x^{(i)}x(i) are known, we can learn user parameters θ(j)\theta^{(j)}θ(j).
  2. If user parameters θ(j)\theta^{(j)}θ(j) are known, we can learn movie features x(i)x^{(i)}x(i).

Instead of alternating between them, collaborative filtering learns both simultaneously.

Why It’s Called Collaborative?

Many users rate movies.

Their ratings collaboratively help the system learn features.

Result:

  • Better movie representations
  • Better recommendations for everyone
flowchart TD
    A[Randomly Initialize User Preferences θ]
    --> B[Learn Movie Features x]
    --> C[Update User Preferences θ]
    --> D[Update Movie Features x]
    --> E[Repeat Until Convergence]

Key Idea

People with similar tastes tend to like similar things.

Collaborative filtering simultaneously learns:

  • user preferences θ\thetaθ
  • item features xxx

directly from the rating matrix, without manually defining features.

flowchart TD
    A[Random User Rate a Movie]
    --> B[Learn User Preferences Vector]
    --> C[Learn Movie Features Vector]
    --> D[Predict & Update Missing Ratings for new Movies]
    --> E[Generate New Recommendations]

Movies feature Matrix x(i)x^{(i)}x(i)

Movie Romance Action
Titanic 0.9 0.1
Notebook 0.95 0.05
Avengers 0.1 0.9
John Wick 0.05 0.95

x(1)=[0.90.1]x^{(1)} = \begin{bmatrix} 0.9 \\ 0.1 \end{bmatrix}x(1)=[0.90.1​]

where

  • x1x_1x1​ = romantic level
  • x2x_2x2​ = action level

From this we infer x(1)x^{(1)}x(1):

  • Movie is romantic
  • Movie is not action

No Intercept Term

Unlike previous models:

  • We remove the intercept feature x0=1x_0 = 1x0​=1.
x(i)∈Rnx^{(i)} \in \mathbb{R}^nx(i)∈Rn θ(j)∈Rn\theta^{(j)} \in \mathbb{R}^nθ(j)∈Rn

Reason:

since the algorithm learns all features automatically, it can learn a constant feature itself if needed.

User Movie Rating Matrix

User Titanic The Notebook Avengers John Wick
Alice ⭐⭐⭐⭐⭐ ⭐ ⭐ ⭐
Bob ⭐⭐⭐⭐ ? ⭐ ⭐
Carol ⭐ ⭐ ⭐ ⭐⭐⭐⭐⭐

Prediction of user jjj rating movie iii:

y^ij=θ(j)Tx(i)\hat{y}_{ij} = \theta^{(j)T} x^{(i)}y^​ij​=θ(j)Tx(i)

User Preferences Matrix θ(j)\theta^{(j)}θ(j)

User Likes Romance Likes Action
Alice 0.95 0.05
Bob 0.85 0.15
Carol 0.05 0.95
  • These features are not manually defined.
  • The algorithm learns them from ratings.

Observation

  • Alice and Bob have similar taste
  • Both dislike action movies
  • Both like romantic movies

So if Bob has not rated The Notebook, we can predict:

  • Bob will probably rate it highly.

The algorithm uses behavior of other users to predict what someone will like.

Learning Movie Features

If user parameters θ(j)\theta^{(j)}θ(j) are known, we can learn movie features x(i)x^{(i)}x(i).

Minimize prediction error:

min⁡x(i)∑j:r(i,j)=1(θ(j)Tx(i)−yij)2+λ2∑k=1n(xk(i))2\min_{x^{(i)}} \sum_{j:r(i,j)=1} \left(\theta^{(j)T}x^{(i)} - y_{ij}\right)^2+ \frac{\lambda}{2}\sum_{k=1}^{n}(x_k^{(i)})^2x(i)min​j:r(i,j)=1∑​(θ(j)Tx(i)−yij​)2+2λ​k=1∑n​(xk(i)​)2

Where:

  • yijy_{ij}yij​ = actual rating
  • r(i,j)=1r(i,j)=1r(i,j)=1 if rating exists
  • λ\lambdaλ = regularization

Learning All Movie Features

flowchart TD
    A[Current Movie Features x]
    --> B[Predict User Ratings]
    --> C[Compute Error]
    --> D[Compute Gradient]
    --> E[Update Features]
    --> F[Better Predictions]

For all movies:

min⁡x(1),...,x(nm)∑i=1nm∑j:r(i,j)=1(θ(j)Tx(i)−yij)2+λ2∑i=1nm∑k=1n(xk(i))2\min_{x^{(1)},...,x^{(n_m)}} \sum_{i=1}^{n_m} \sum_{j:r(i,j)=1} \left(\theta^{(j)T}x^{(i)} - y_{ij}\right)^2+ \frac{\lambda}{2}\sum_{i=1}^{n_m}\sum_{k=1}^{n}(x_k^{(i)})^2x(1),...,x(nm​)min​i=1∑nm​​j:r(i,j)=1∑​(θ(j)Tx(i)−yij​)2+2λ​i=1∑nm​​k=1∑n​(xk(i)​)2
New Feature
=
Old Feature
-
Learning Rate × Gradient

We are updating:

xk(i)x_k^{(i)}xk(i)​

Predicted rating:

(θ(j))Tx(i)(\theta^{(j)})^T x^{(i)}(θ(j))Tx(i)

This is:

User Preferences·Movie Features = Predicted Rating

Error Term

Prediction error

(predicted rating - actual rating)^2
(θ(j))Tx(i)−y(i,j)(\theta^{(j)})^T x^{(i)} - y^{(i,j)}(θ(j))Tx(i)−y(i,j)

where:

  • predicted rating minus
  • actual rating

If:

  • error is large → update more
  • error is small → update less

Regularization (λ\lambdaλ)

λxk(i)\lambda x_k^{(i)}λxk(i)​

Prevents features from becoming too large.

Helps reduce overfitting.


Collaborative Filtering Algorithm

1. Initialize θ\thetaθ randomly.

Initialize with small random values:

x(i),θ(j)x^{(i)}, \theta^{(j)}x(i),θ(j)

We do this

2. Minimize the cost function J(x,θ)J(x,\theta)J(x,θ)

  • Estimate movie features: Fix θ\thetaθ, learn xxx
  • Estimate user preferences: Fix xxx, learn θ\thetaθ
  • Repeat until convergence.

If we:

  • fix xxx and minimize JJJ w.r.t. θ\thetaθ, we recover the user learning problem.
  • fix θ\thetaθ and minimize JJJ w.r.t. xxx, we recover the movie feature learning problem.

Instead of alternating between them, we optimize both together.

Minimize cost with:

  • Gradient Descent
  • Advanced optimizers (e.g., Conjugate Gradient, L-BFGS)
J(x,θ)J(x,\theta)J(x,θ)

We combine both learning problems into a single cost function.

J(x,θ)=12∑(i,j):r(i,j)=1(θ(j)Tx(i)−y(i,j))2+λ2∑j=1nu∑k=1n(θk(j))2+λ2∑i=1nm∑k=1n(xk(i))2J(x,\theta)= \frac{1}{2} \sum_{(i,j):r(i,j)=1} (\theta^{(j)T}x^{(i)} - y^{(i,j)})^2+ \frac{\lambda}{2} \sum_{j=1}^{n_u}\sum_{k=1}^{n}(\theta_k^{(j)})^2+ \frac{\lambda}{2} \sum_{i=1}^{n_m}\sum_{k=1}^{n}(x_k^{(i)})^2J(x,θ)=21​(i,j):r(i,j)=1∑​(θ(j)Tx(i)−y(i,j))2+2λ​j=1∑nu​​k=1∑n​(θk(j)​)2+2λ​i=1∑nm​​k=1∑n​(xk(i)​)2

Where:

  • y(i,j)y^{(i,j)}y(i,j) = rating user jjj gave movie iii
  • r(i,j)=1r(i,j)=1r(i,j)=1 if rating exists, otherwise 000
  • x(i)x^{(i)}x(i) = feature vector for movie iii
  • θ(j)\theta^{(j)}θ(j) = parameter vector for user jjj

This objective:

  • penalizes prediction error
  • regularizes user parameters
  • regularizes movie features

3. Rating Prediction

Once the model is trained, predicted rating:

y^(i,j)=θ(j)Tx(i)\hat{y}^{(i,j)} = \theta^{(j)T}x^{(i)}y^​(i,j)=θ(j)Tx(i)

If user jjj has not rated movie iii, we predict their rating using this value.

4. Result

The algorithm learns:

  • movie feature vectors x(i)x^{(i)}x(i)
  • user preference vectors θ(j)\theta^{(j)}θ(j)

from the rating matrix alone, without manually defining movie features.


← Previous

Recommender Systems: Collaborative Filtering, Content-Based Filtering, and Hybrid Approaches

Next →

Anomaly Detection: Identifying Rare and Unusual Patterns in Data

AI-Machine-Learning/7-1-Collaborative-Filtering
Let's work together
+49 176-2019-2523
hiteshkrsahu@gmail.com
WhatsApp
Skype
Munich 🥨, Germany 🇩🇪, EU
Playstore
Hitesh Sahu's apps on Google Play Store
Need Help?
Let's Connect
Navigation
  Home/About
  Skills
  Work/Projects
  Lab/Experiments
  Contribution
  Awards
  Art/Sketches
  Thoughts
  Contact
Links
  Sitemap
  Legal Notice
  Privacy Policy

Made with

NextJS logo

NextJS by

hitesh Sahu

| © 2026 All rights reserved.