K-Means Clustering
K-Means is a powerful unsupervised learning algorithm for clustering data into coherent subsets. It iteratively assigns points to the nearest centroid and updates centroids to minimize distortion, making it widely used in practice.
Principal Component Analysis (PCA) Explained
Anomaly Detection: Identifying Rare and Unusual Patterns in Data
K-Means Clustering
K- Mean is used to solves the clustering problem:
- one of the most widely used clustering algorithms.
- K-Means is an Iterative Algorithm
It is simple, fast, and widely used in practice.

Scikit Learn k-mean example:
from sklearn.cluster import KMeans
import numpy as np
X = np.array([[1, 2], [1, 4], [1, 0],
[10, 2], [10, 4], [10, 0]])
kmeans = KMeans(
n_clusters=2, # Number of Cluster
random_state=0, # Dont Starts with random centroid locations
n_init="auto") # automatically chooses the number of runs.
.fit(X) # Train Model
# print Cluster centers
kmeans.cluster_centers_
# print groups label for data items
kmeans.labels_
Clustering
Given an *unlabeled dataset automatically group the data into coherent subsets, called clusters.
Cluster analysis is a powerful unsupervised learning technique for discovering groups in data.
When K-Means Works Well
K-Means works best when:
- Clusters are roughly spherical.
- Clusters have similar sizes.
- Features are properly scaled.
Feature scaling is important:
Without scaling, features with larger magnitude dominate distance calculations.
When K-Means Struggles
K-Means performs poorly when:
- Clusters are non-convex.
- Clusters have very different densities.
- Data contains significant outliers.
- Clusters are not well separated.
K-Mean Algorithm
K-Means works as follows:
Takes and unlabeled data as input.
Given:
Training set
where .
And number of clusters
Repeats: {
-
Assignment Step
Assign points to nearest centroid.
-
Centroid update step
Move centroids to mean of assigned points.
Minimizes Cost function (Distortion):
}
Stops when assignments stabilize.
Output is cluster assignments for each example and cluster centroids.
Suppose we want to group data into clusters.

Algorithm
Repeat for :
where T = number of runs
- Randomly initialize centroids
- Run K-Means to convergence
- Compute distortion:
Finally:
Input
- (number of clusters) = 2 in this case
- Dataset , where
- = number of examples
Example: 🍓🍡🍬🍭
Candies scatter on table with bowls to collect them.
1. Initialize Cluster centroids() 🥣
Randmly put 2 Bowls on the table to start collecting candies.
Randomly initialize cluster centroids
Total number of clusters
We have centroids:
- = centroid of cluster
where
- is a point in the same space as the data (i.e., )
Choosing the Number of Clusters (K)
There is no universal rule for choosing .
Proper Random Initialization
When running K-Means:
Where:
- = number of clusters
- = number of training examples
It does not make sense to choose .
Recommended Initialization Method
- Randomly select distinct training examples.
- Set initial centroids equal to those examples:
where are randomly chosen distinct indices.
This ensures centroids start within the data distribution.
Elbow Method
When we increase , distortion decreases because we have more centroids to fit the data.
Idea is to find point where adding more clusters does not significantly reduce distortion.

- Run K-Means for different values of .
- Compute distortion for each.
- Plot vs .
- Look for an “elbow” where improvement slows down.
Repeatedly Steps 2-3 until convergence {
2. 📏 Cluster Assignment Step
For each training example() where :
Assign to the cluster with the closest centroid.
In Mathematical terms:
Index of Cluster
Index of cluster is Centroid closest to in
Examples:
- If is closest to , then
- If is closest to , then
= CLuster centroid of cluster
Value of centroid of cluster assigned to is
- If , then
- If , then
Which can be expressed as:
Formally:
Where:
- = centroid of cluster
- = cluster assigned to example
We typically use squared Euclidean distance.

Example
Given
Centroids:
Training example:
Distance to
Distance to
Distance to
Assigning Cluster
- Distance to : 4
- Distance to : 8
- Distance to : 25
The smallest distance is to so
3. 🧿 Move Centroid Step
For each cluster where :
- = total number of clusters
Update centroid to be the mean of all points assigned to it.
Where:
- = set of points assigned to cluster
- = number of points in cluster
This moves the centroid to the center of its assigned points.
- Each centroid becomes the mean of the points assigned to it.
⛔ Stop Condition
The algorithm converges when:
- Cluster assignments no longer change, or
- Centroids stop moving.
What If a Cluster Gets No Points?
If some centroid has zero assigned points:
- Eliminate the cluster (resulting in clusters), or
- Reinitialize the centroid randomly
In practice, empty clusters are uncommon with reasonable initialization.
💰 Cost Function (Distortion)
Minimize th Distance between points and their assigned centroids.
Or equivalently:
Where:
- = cost function (distortion)
- = number of training examples
- = -th training example
- = centroid of the cluster assigned to
- = index of cluster assigned to
Each iteration consists of:
-
Assignment step
Minimizes with respect to (cluster assignments). -
Centroid update step
Minimizes with respect to (centroid locations).
Since each step does not increase , the algorithm is guaranteed to converge.
Each iteration of K-Means guarantees that does not increase.
However, it may converge to a local minimum, not necessarily the global minimum.
A good solution:
- Each natural cluster is captured by one centroid.
A bad solution:
- Merge two true clusters into one
- Split one true cluster into multiple parts
- Assign very few points to some clusters
All of these correspond to higher distortion:
To reduce the risk of poor local minima:
